On Wednesday 08 November 2006 3:21 am, Prof Brian Ripley wrote:
So far I was not able to figure out why this is necessary -
could anyone help ?
You need to remove the class to avoid recursion: a few lines later x[i]
needs to be a call to the primitive and not the data frame method.
I see. Is there a way to get at the primitive directly, i.e. something like
`[.list`(x, i) ?
The reason I am looking at it is that changing attributes forces
duplication of the data frame and this is the largest cause of slowness
of data.frames in general.
Do you have evidence of that? R has facilities to profile its code, and I
have never seen [.data.frame taking a significant proportion of the total
time. If it does for your application, consider if a data frame is an
appropriate way to store your data. I am not sure we would accept that
data frames do have 'slowness in general', but their generality does make
them slower than alternatives where the generality is not needed.
Evidence:
# this can be copy'n'pasted directly into an R session
# small N - both system calls return small, but comparable running times
N<-100000
A<-data.frame(X=1:N, Y=rnorm(N), Z=as.character(rnorm(N)))
system.time(B<-A[,1])
system.time(B<-A[1,1])
#larger N - both times are larger and still comparable
N<-1000000
A<-data.frame(X=1:N, Y=rnorm(N), Z=as.character(rnorm(N)))
system.time(B<-A[,1])
system.time(B<-A[1,1])
The running times would also grow with the number of columns. Also I have
modified 2.4.0 version of R to print out large allocations and I get the
impression that the data frame is being duplicated. Same happens for
`[<-.data.frame` - but this function has much more complex code, I have not
looked through it yet.
Of course, getting a small portion (i.e. A[1:5,]) also takes a lot of time -
but the examples showed above should be O(1).
My data is a result of data base query - it has naturally columns of different
types and the columns are named (no row.names though) - which is why I used
data.frames. What would you suggest ?
thank you very much !
Vladimir Dergachev