Skip to content

Light-weight data.frame class: was: how to add method to .Primitive function

3 messages · Vadim Ogranovich, Simon Urbanek, Gabor Grothendieck

#
Hi,

Encouraged by a tip from Simon Urbanek I tried to use the S3 machinery
to write a faster version of the data.frame class.
This quickly hits a snag: the "[.default"(x, i) for some reason cares
about the dimensionality of x.
In the end there is a full transcript of my R session. It includes the
motivation for writing the class and the problems I have encountered.

As a result I see three issues here:
* why "[.default"(x, i) doesn't work if dim(x) is 2? After all a single
subscript into a vector works regardless of whether it's a matrix or
not. Is there an alternative way to access "[.default"?
* why does unclass() make deep copy? This is a facet of the global
over-conservatism of R with respect to copying.
* is it possible to add some sort copy profiling to R? Something like
copyProfiling(TRUE), which should cause R to log sizes of each copied
object (just raw sizes w/o any attempt to identify the object). This
feature should at least help assess the magnitude of the problem.

Thanks,
Vadim

Now the transcript itself:
times) slower than that of a list
[1] 1.01 0.14 1.14 0.00 0.00
[1] 0.06 0.00 0.06 0.00 0.00
data.frame
class="data.frame", row.names=as.character(seq(nrow(x))))
col[i])
"[.default"
+   if (nargs() == 2)
+     NextMethod("[", x, i)
+   else
+     structure(lapply(x[j], function(col) col[i]),  class = "lwdf")
+ }
dimensionality of its argument
Error in "[.default"(x, j) : incorrect number of dimensions
+   structure(lapply(unclass(x)[j], function(col) col[i]),  class =
"lwdf")
+ }
a c
1 1 a
2 3 c
evidenced by the following timing
[1] 0.01 0.00 0.01 0.00 0.00
[1] 0.44 0.39 0.82 0.00 0.00
_                       
platform x86_64-unknown-linux-gnu
arch     x86_64                  
os       linux-gnu               
system   x86_64, linux-gnu       
status                           
major    2                       
minor    0.1                     
year     2004                    
month    11                      
day      15                      
language R
#
Vadim,
On May 8, 2005, at 2:09 PM, Vadim Ogranovich wrote:

            
Umm... what about his:

"[.lwdf" = function(x, i, j) { r<-lapply(lapply(j,function(a) x 
[[a]]),function(x) x[i]); names(r)<-names(x)[j]; r }

The subsetting operates on vectors, so it's not a problem. Don't ask  
me about the speed, though ;). And btw: you could access "[

What I meant with my cautious remarks are the following issues. You  
were talking about building a df alternative (s/df/data.frame/g in  
this e-mail). The first issue is that by re-defining "[" and friends  
you make your new calls incompatible with the behavior of lists, so  
you won't be able to use it where lists are required (even though  
is.list says TRUE). This may break code were you'd like your class to  
act as a list. On the other hand, your class is not a df either - and  
I suspect that it's far from trivial to make it even closely  
compatible with a df in terms of its behavior. Moreover any function  
that checks for df won't treat your class as such, because it simply  
is no df (is.data.frame()=FALSE for starters). So in the end, you  
would have to modify every function in R that uses df to recognize  
your new class. On the other hand if you make your class a subclass  
of df (there we get into some trouble with S3), you could replace the  
back-end, but then you will have to support every df feature  
including row.names. You could try it, but I'm somewhat skeptical...  
but your mileage may vary ...

Cheers,
Simon
1 day later
#
"[.default" is implemented in R as .subset.  See ?.subset and note that
it begins with a dot.  e.g. for the case where i and j are not missing:

"[.lwdf" <- function(x, i, j) lapply(.subset(x,j), "[", i)
On 5/8/05, Vadim Ogranovich <vograno@evafunds.com> wrote: