Skip to content

subset data.frame at C level

3 messages · Jim Hester, Morgan Morgan

#
Hi,

Hope you are well.

I was wondering if there is a function at C level that is equivalent to
mtcars$carb or .subset2(mtcars, "carb").

If I have the index of the column then the answer would be VECTOR_ELT(df,
asInteger(idx)) but I was wondering if there is a way to do it directly
from the name of the column without having to loop over columns names to
find the index?

Thank you
Best regards
Morgan
6 days later
#
It looks to me like internally .subset2 uses `get1index()`, but this
function is declared in Defn.h, which AFAIK is not part of the exported R
API.

 Looking at the code for `get1index()` it looks like it just loops over the
(translated) names, so I guess I just do that [0].

[0]:
https://github.com/r-devel/r-svn/blob/1ff1d4197495a6ee1e1d88348a03ff841fd27608/src/main/subscript.c#L226-L235

On Wed, Jun 17, 2020 at 6:11 AM Morgan Morgan <morgan.emailbox at gmail.com>
wrote:

  
  
#
Thank you Jim for the feedback.

I actually implemented it the way I describe it in my first email and it
seems fast enough for me.

Just to give a bit of context I will need it at some point in package kit.
I also implemented subset by row which I actually need more as I am working
on a faster version of the unique and duplicated function. The function
unique is particularly slow for data.frame. So far I got a 100x speedup.

Best regards
Morgan
On Tue, 23 Jun 2020, 21:11 Jim Hester, <james.f.hester at gmail.com> wrote: