Skip to content
Prev 304632 / 398503 Next

apply --> data.frame

It is hard to help when you don't give an example of your input data
and what you want to be computed (in a form one can source or copy
into an R session).  Is the following something like what you are doing?

Suppose you have a function that takes a file name and
returns a list of things of various types extracted from the
file.  A toy example would be
    fileExtract <- function(fileName) {
       fi <-  file.info(fileName)
       byte0 <- if (fi$isdir || fi$size < 1) NA_integer_ else readBin(fileName, what="integer", size=1, n=1)    
       list(Name=basename(fileName), IsDir=fi$isdir, Size=fi$size, FirstByte = byte0, ModTime=fi$mtime)
   } 
Then you can get the list of rows that you want converted to a data.frame
with
   rows <- lapply(dir(R.home(), full.names=TRUE), fileExtract)
E.g., I get
  > dput(rows[1:2])
  list(structure(list(Name = "bin", IsDir = TRUE, Size = 0, FirstByte = NA_integer_, 
      ModTime = structure(1343316337, class = c("POSIXct", "POSIXt"
      ))), .Names = c("Name", "IsDir", "Size", "FirstByte", "ModTime"
  )), structure(list(Name = "CHANGES", IsDir = FALSE, Size = 28204, 
      FirstByte = 87L, ModTime = structure(1340406834, class = c("POSIXct", 
      "POSIXt"))), .Names = c("Name", "IsDir", "Size", "FirstByte", 
  "ModTime")))
Note that the j'th element of each row has a fixed type.
You want a data.frame with columns named "Name", "IsDir",
"Size", and "FirstByte" where the i'th row contains the data in row[[i]].

If that is what you want then here is a function that does a pretty good job of it:
function (listOfRows, nItemsPerRow = unique(vapply(listOfRows, 
    length, 0)), col.names = names(rowTemplate), rowTemplate = listOfRows[[1]], 
    ...) 
{
    stopifnot(length(nItemsPerRow) == 1, nItemsPerRow == length(rowTemplate))
    if (is.null(col.names)) {
        col.names <- sprintf("V%d", seq_len(nItemsPerRow))
    }
    else {
        stopifnot(nItemsPerRow == length(col.names))
    }
    columns <- lapply(structure(seq_len(nItemsPerRow), names = col.names), 
        FUN = function(i) {
            v <- vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]])
            if (is.matrix(v)) { # for when length(rowTemplate[[i]])>1 
                v <- t(v)
            }
            v
        })
    data.frame(columns, ...)
}
E.g.,
'data.frame':   19 obs. of  5 variables:
 $ Name     : Factor w/ 19 levels "bin","CHANGES",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ IsDir    : logi  TRUE FALSE FALSE TRUE TRUE TRUE ...
 $ Size     : num  0 28204 18351 0 0 ...
 $ FirstByte: int  NA 87 9 NA NA NA NA 101 NA 82 ...
 $ ModTime  : num  1.34e+09 1.34e+09 1.34e+09 1.34e+09 1.34e+09 ...
Note that the POSIXct item, ModTime, got converted to numeric because
vapply didn't handle that class properly.

An advantage of vapply is that it will do some type checking:
Error in vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]]) : 
  values must be type 'double',
 but FUN(X[[2]]) result is type 'character'
It will also deal with things like the following, where each row element
contains a few vectors and you want the each vector element in its
own column:
  > str(f(list(list(1:2, 1+1i, letters[1:3]), list(11:12, 11+11i, letters[4:6]))))
  'data.frame':   2 obs. of  6 variables:
   $ V1.1: int  1 11
   $ V1.2: int  2 12
   $ V2  : cplx  1+1i 11+11i
   $ V3.1: Factor w/ 2 levels "a","d": 1 2
   $ V3.2: Factor w/ 2 levels "b","e": 1 2
   $ V3.3: Factor w/ 2 levels "c","f": 1 2

There are other ways to do this, but I don't know if this is the problem
you want to solve.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com