-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck@gmail.com]
Sent: Monday, May 09, 2005 9:46 PM
To: Vadim Ogranovich
Cc: r-devel@stat.math.ethz.ch; simon.urbanek@r-project.org
Subject: Re: [Rd] Light-weight data.frame class: was: how to
add method to .Primitive function
"[.default" is implemented in R as .subset. See ?.subset and
note that it begins with a dot. e.g. for the case where i
and j are not missing:
"[.lwdf" <- function(x, i, j) lapply(.subset(x,j), "[", i)
On 5/8/05, Vadim Ogranovich <vograno@evafunds.com> wrote:
Hi,
Encouraged by a tip from Simon Urbanek I tried to use the
to write a faster version of the data.frame class.
This quickly hits a snag: the "[.default"(x, i) for some
about the dimensionality of x.
In the end there is a full transcript of my R session. It
motivation for writing the class and the problems I have
As a result I see three issues here:
* why "[.default"(x, i) doesn't work if dim(x) is 2? After all a
single subscript into a vector works regardless of whether it's a
matrix or not. Is there an alternative way to access "[.default"?
* why does unclass() make deep copy? This is a facet of the global
over-conservatism of R with respect to copying.
* is it possible to add some sort copy profiling to R?
copyProfiling(TRUE), which should cause R to log sizes of
object (just raw sizes w/o any attempt to identify the
feature should at least help assess the magnitude of the problem.
Thanks,
Vadim
Now the transcript itself:
# the motivation: subscription of a data.frame is *much*
times) slower than that of a list
# compare
n = 1e6
i = seq(n)
x = data.frame(a=seq(n), b=seq(n))
system.time(x[i,], gcFirst=TRUE)
[1] 1.01 0.14 1.14 0.00 0.00
x = list(a=seq(n), b=seq(n))
system.time(lapply(x, function(col) col[i]), gcFirst=TRUE)
[1] 0.06 0.00 0.06 0.00 0.00
# the solution: define methods for the light-weight
lwdf = function(...) structure(list(...), class = "lwdf")
# dim
dim.lwdf = function(x) c(length(x[[1]]), length(x))
# for pretty printing we define print.lwdf via a conversion to
# as.data.frame.lwdf
as.data.frame.lwdf = function(x) structure(unclass(x),
class="data.frame", row.names=as.character(seq(nrow(x))))
# print
print.lwdf = function(x) print.data.frame(as.data.frame.lwdf(x))
# now the real stuff
# "["
# the naive "[.lwdf" = function (x, i, j) lapply(x[j],
# won't work because evaluation of x[j] calls "[.lwdf"
# so we switch by the number of arguments "[.lwdf" =
+ if (nargs() == 2)
+ NextMethod("[", x, i)
+ else
+ structure(lapply(x[j], function(col) col[i]), class
x = lwdf(a=seq(3), b=letters[seq(3)],
c=as.factor(letters[seq(3)]))
i = c(1,3); j = c(1,3)
# unfortunately, for some reasons "[.default" cares about
dimensionality of its argument
Error in "[.default"(x, j) : incorrect number of dimensions
# we could use unclass to get it right "[.lwdf" =
+ structure(lapply(unclass(x)[j], function(col) col[i]), class =
"lwdf")
+ }
# *but* unclass creates a deep copy of its argument as indirectly
evidenced by the following timing
x = lwdf(a=seq(1e6)); system.time(unclass(x))
[1] 0.01 0.00 0.01 0.00 0.00
x = lwdf(a=seq(1e8)); system.time(unclass(x))
[1] 0.44 0.39 0.82 0.00 0.00
_
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 0.1
year 2004
month 11
day 15
language R