Skip to content
Prev 40670 / 63421 Next

speeding up perception

This is just a quick, incomplete response, but the main misconception is really the use of data.frames. If you don't use the elaborate mechanics of data frames that involve the management of row names, then they are definitely the wrong tool to use, because most of the overhead is exactly to manage to row names and you pay a substantial penalty for that. Just drop that one feature and you get timings similar to a matrix:
rows: 0.015 0 0.015 0 0 
columns: 0.01 0 0.01 0 0
rows: 0.015 0 0.016 0 0 
columns: 0.012 0 0.011 0 0 

(with example modified to use m[[y]][x] instad of m[x,y])

I would not be surprised that many people use data.frames for the convenience of the matrix subsetting/subassignement operators and don't really care about the row names and for all those uses data.frames are the wrong tool. (Just look at `[.data.frame` and `[<-.data.frame`).

As a side note, it's a bit pointless to compare the performance to matrices as they imposes much more rigorous structure (all columns have the same type) - if you use data frames in such special (rare) cases, it's really your fault ;). So the bottom line is to educate users to not use data frames where not needed and/or provide alternatives (and there may be some things coming up, too).

And as I said, this is just a quick note, so carry on and comment on the original question ;).

Cheers,
Simon
On Jul 2, 2011, at 2:23 PM, ivo welch wrote: