List of lists? Data frames? (Or other data structures?)
On Wed, 30 Apr 2003, Roger Peng wrote:
If you're talking about rows and columns, it seems like the appropriate data structure for you is the data frame. I think your list of lists representation might get unwieldy after a while. I can't really think of why a data frame would be any slower than a list of lists -- I've never experienced such behavior. read.table() may be a little slower than scan() because read.table() reads in an entire file and then converts each of the columns into an appropriate data class. So there is some post-processing going on. It doesn't have anything to do with data frames vs. lists.
Only if you don't specify colClasses: if you do (and you would need the information to use scan()) there should be no performance penalty. (Note that matrices can be scan()-ed into a vector and the dimensions added, and that will be faster.)
-roger
_______________________________ UCLA Department of Statistics http://www.stat.ucla.edu/~rpeng On Thu, 1 May 2003, R A F wrote: Hi, I'm faced with the following problem and would appreciate some advice. I could have a data frame x that looks like this: aa bb a 1 "A" b 2 "B" The advantage of this is that I could access all the individual components easily. Also I could access all the rows and columns easily. Alternatively, I could have a list of lists that looks like this: xprime <- list() xprime$a <- list() xprime$b <- list() xprime$a$aa <- 1 xprime$a$bb <- "A" xprime$b$aa <- 2 xprime$b$bb <- "B" etc. If speed is important, would a list of lists be faster than a data frame? (I know, for example, that scan is supposed to be faster than read.table, but I don't know if that is related to issues with data frames.) My problem with a list of lists, though, is that if I want to access all the bb subcomponents, a naive method like this one failed: y <- c( "a", "b" ) xprime[[ y ]]$bb (Does not work)
You are supposed to use [[ ]] to extract a single component. I don't think you expected
xprime[[ y ]]
[1] "A" (as from 1.7.0).
So to get all the bb subcomponents I seem to need to loop, which may slow things down (presumably). But maybe people here know of a way.
Something is going to have to loop, so it probably is not slow to use an explicit loop.
Finally what would be the "best" way given the constraint of quick access to all rows, columns and individual components? I'd appreciate your thoughts and comments. Thanks very much.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595