Fast nested List->data.frame
Dieter,
I'd approach this by first making a matrix, then converting to a data
frame with appropriate types. I'm sure there is a way to do it with
structure in one step. Operations on matrices are usually faster than on
dataframes.
len <- 100000
d <- replicate(len, list(pH = 3, marker = TRUE, position = "A"), FALSE)
toDF <- function(alist){
d.matrix <- matrix(unlist(alist), ncol = 3, byrow = TRUE)
d.df <- as.data.frame(d.matrix)
names(d.df) <- c('pH', 'marker', 'position')
d.df$pH <- as.numeric(d.df$pH)
d.df$marker <- as.logical(d.df$marker)
return(d.df)
}
on my system,
system.time(b<-toDF(d))
user system elapsed
0.560 0.033 0.592
and
head(b)
pH marker position
1 1 TRUE A
2 1 TRUE A
3 1 TRUE A
4 1 TRUE A
5 1 TRUE A
6 1 TRUE A
and
sapply(b, class)
pH marker position
"numeric" "logical" "factor"
I hope this helps,
Greg
sessionInfo() ##old, I know.
R version 2.9.0 (2009-04-17)
i386-apple-darwin8.11.1
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] cimis_0.1-3 RLastFM_0.1-4 RCurl_0.98-1 bitops_1.0-4.1
XML_2.5-3
[6] lattice_0.17-22
loaded via a namespace (and not attached):
[1] grid_2.9.0
On 1/4/10 11:43 PM, Dieter Menne wrote:
I have very large data sets given in a format similar to d below. Converting
these to a data frame is a bottleneck in my application. My fastest version
is given below, but it look clumsy to me.
Any ideas?
Dieter
# -----------------------
len = 100000
d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE)
# Data are given as d
# preallocate vectors
pH =rep(0,len)
marker =rep(0,len)
position =rep(0,len)
system.time(
{
for (i in 1:len)
{
d1 = d[[i]]
#Assign to vectors
pH[i] = d1[[1]]
marker[i] = d1[[2]]
position[i] = d1[[3]]
}
# combine vectors
pHAll = data.frame(pH,marker,position)
}
)
Greg Hirson ghirson at ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616