Skip to content

which is the fastest way to make data.frame out of a three-dimensional array?

6 messages · Bert Gunter, Petr Savicky, Hans Ekbrand

#
foo <- rnorm(30*34*12)
dim(foo) <- c(30, 34, 12)

I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame.

I know how this can be done in a very slow way using for loops, like this:

x <- rep(seq(from = 1, to = 30), 34)
y <- as.vector(sapply(1:34, function(x) {rep(x, 30)}))
month <- as.vector(sapply(1:12, function(x) {rep(x, 30*34)}))
my.df <- data.frame(month, x=rep(x, 12), y=rep(y, 12), temp=rep(NA, 30*34*12))
my.counter <- 1 
for(month in 1:12){
  for(i in 1:34){
    for(j in 1:30){
      my.df$temp[my.counter] <- foo[j,i,month]
      my.counter <- my.counter + 1
    }
  }
}

str(my.df)
'data.frame':	12240 obs. of  4 variables:
 $ month: int  1 1 1 1 1 1 1 1 1 1 ...
 $ x    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ y    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ temp : num  0.673 -1.178 0.54 0.285 -1.153 ...

(In the real world problem I had, data was monthly measurements of temperature and x, y was coordinates).

Does anyone care to share a faster and less ugly solution? 

TIA
#
Cheat!  Arrays are stored in column major order, so you can translate
the indexing directly by:

Assume dim(yourarray) = c(n1,n2,n3)

*** warning: UNTESTED **

yourframe <- data.frame( dat = as.vector(yourarray)
 , dim1 = rep(seq_len(n1), n2*n3
,dim2 = rep( rep(seq_len(n2), e=n1), n3)
, dim3 = rep(seq_len(n3), e = n1*n2)
)

Probably see also the reshape package for more elegant solutions.

Cheers,
Bert
On Sat, Feb 25, 2012 at 7:54 AM, Hans Ekbrand <hans at sociologi.cjb.net> wrote:

  
    
#
On Sat, Feb 25, 2012 at 08:07:01AM -0800, Bert Gunter wrote:
Hi.

Try this

  df <- data.frame(dat=c(foo), which(foo == foo, arr.ind=TRUE))

This may be less efficient, but easier to remember.

Hope this helps.

Petr Savicky.
#
On Sat, Feb 25, 2012 at 04:54:30PM +0100, Hans Ekbrand wrote:
Hi.

Try this

  n1 <- dim(foo)[1]
  n2 <- dim(foo)[2]
  n3 <- dim(foo)[3]
  df <- cbind(dat=c(foo), expand.grid(dim1=1:n1, dim2=1:n2, dim3=1:n3))
  df[1:5, ]

           dat dim1 dim2 dim3
  1 -0.5765847    1    1    1
  2  0.4490040    2    1    1
  3  0.2626855    3    1    1
  4  0.2206713    4    1    1
  5  0.9079324    5    1    1
  ...

On the contrary to a previous suggestion with foo==foo, this
works also in presence of NA.

Hope this helps.

Petr Savicky.
#
Petr:

Your expand.grid solution is clearly much better than my nonsense. It
is just as fast (or faster) and is the far more sensible thing to do.

For an array, ar, with dim(ar) = c(100,100,1000) , modifying your call
slightly to:

data.frame(c(ar),do.call(expand.grid,lapply(dim(ar),seq_len)))

I got:
  user  system elapsed
   1.93    0.43    2.38

Using my call I got:
   user  system elapsed
   2.23    0.44    2.70

Thanks for the help.

-- Bert
On Sat, Feb 25, 2012 at 9:55 AM, Petr Savicky <savicky at cs.cas.cz> wrote:

  
    
#
First, thank you both Bert and Petr for your excellent answers. 

Berts solution seems somewhat faster, and Petrs is - in my opion at
least - slightly more elegant.
+                     dim1 = rep(seq_len(n[1]), n[2]*n[3]),
+                     dim2 = rep(rep(seq_len(n[2]), e=n[1]), n[3]),
+                     dim3 = rep(seq_len(n[3]), e = n[1]*n[2])))
   user  system elapsed 
  0.932   0.156   1.090
user  system elapsed 
  0.980   0.252   1.244