data frame subscription operator
.subset and .subset2 are equivalent to [ and [[ except that dispatch does not take place. See ?.subset
On 11/8/06, Vladimir Dergachev <vdergachev at rcgardis.com> wrote:
On Wednesday 08 November 2006 3:21 am, Prof Brian Ripley wrote:
So far I was not able to figure out why this is necessary - could anyone help ?
You need to remove the class to avoid recursion: a few lines later x[i] needs to be a call to the primitive and not the data frame method.
I see. Is there a way to get at the primitive directly, i.e. something like `[.list`(x, i) ?
The reason I am looking at it is that changing attributes forces duplication of the data frame and this is the largest cause of slowness of data.frames in general.
Do you have evidence of that? R has facilities to profile its code, and I have never seen [.data.frame taking a significant proportion of the total time. If it does for your application, consider if a data frame is an appropriate way to store your data. I am not sure we would accept that data frames do have 'slowness in general', but their generality does make them slower than alternatives where the generality is not needed.
Evidence:
# this can be copy'n'pasted directly into an R session
# small N - both system calls return small, but comparable running times
N<-100000
A<-data.frame(X=1:N, Y=rnorm(N), Z=as.character(rnorm(N)))
system.time(B<-A[,1])
system.time(B<-A[1,1])
#larger N - both times are larger and still comparable
N<-1000000
A<-data.frame(X=1:N, Y=rnorm(N), Z=as.character(rnorm(N)))
system.time(B<-A[,1])
system.time(B<-A[1,1])
The running times would also grow with the number of columns. Also I have
modified 2.4.0 version of R to print out large allocations and I get the
impression that the data frame is being duplicated. Same happens for
`[<-.data.frame` - but this function has much more complex code, I have not
looked through it yet.
Of course, getting a small portion (i.e. A[1:5,]) also takes a lot of time -
but the examples showed above should be O(1).
My data is a result of data base query - it has naturally columns of different
types and the columns are named (no row.names though) - which is why I used
data.frames. What would you suggest ?
thank you very much !
Vladimir Dergachev
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel