Skip to content

Inefficiency in df$col

6 messages · Radford Neal, Peter Dalgaard, Martin Maechler +1 more

#
I maybe wasn't completely clear.  The $ operator for data frames was
previously done in C - since it was done by the same primitive as for
lists.  In R-3.1.0, this was changed - producing a massive slowdown -
for the purpose of giving a warning on partial matches even if the
user had not set the warnPartialMatchDollar option to TRUE.  In
R-3.1.1, this was changed to not warn unless warnPartialMatchDollar was
TRUE which was the PREVIOUS behaviour.  In other words, this change
reverted the change made in R-3.1.0.  But instead of simply deleting
the definition of $.data.frame, R-3.1.1 added extra code to it, the
only effect of which is to slightly change the wording of the warning
message from what is produced for any other list, while still retaining
the massive slowdown.

There is no need for you to write $.data.frame in C.  You just need
to delete the version written in R.

   Radford Neal
#
On 04/02/2019 9:20 a.m., Radford Neal wrote:
Sorry, I did misunderstand.  Thanks for the clarification.

But if the "You" in your last sentence meant me, it needs to be "They": 
I am not a member of R Core and can't make any changes to the sources.

Duncan Murdoch
#
Does either of you have a patch against current R-devel? 

I tried the obvious, but the build dies with

building package 'tools'
all.R is unchanged
../../../../library/tools/libs/x86_64/tools.so is unchanged
installing 'sysdata.rda'
Error in get(method, envir = home) : object '$.data.frame' not found
Error: unable to load R code in package 'tools'
Execution halted

...and I can't really be arsed to dig into tools to see exactly where it is hardcoding the existence of $.data.frame.

-pd

  
    
#
> Does either of you have a patch against current R-devel? 
    > I tried the obvious, but the build dies with

    > building package 'tools'
    > all.R is unchanged
    > ../../../../library/tools/libs/x86_64/tools.so is unchanged
    > installing 'sysdata.rda'
    > Error in get(method, envir = home) : object '$.data.frame' not found
    > Error: unable to load R code in package 'tools'
    > Execution halted

    > ...and I can't really be arsed to dig into tools to see exactly where it is hardcoding the existence of $.data.frame.

    > -pd

Well, we two have been working "in parallel"...

I've just sent an e-mail to R-core about this:

It's really  file.size() which does need a 'data.frame' method for `$`
because it is basically a wrapper  file.info(..)$size  and
file.info(..) does return a data frame.

I've been suggesting the byte compiler's optimizations here to be
the problem ...
>> On 4 Feb 2019, at 15:32 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>
>> On 04/02/2019 9:20 a.m., Radford Neal wrote:
>>>>> I think you might want to just delete the definition of $.data.frame,
    >>>>> reverting to the situation before R-3.1.0.
    >>>> 
    >>>> I imagine the cause is that the list version is done in C code rather
    >>>> than R code (i.e. there's no R function `$.list`).  So an alternative
    >>>> solution would be to also implement `$.data.frame` in the underlying C
    >>>> code.  This won't be quite as fast (it needs that test for NULL), but
    >>>> should be close in the full match case.
    >>> I maybe wasn't completely clear.  The $ operator for data frames was
    >>> previously done in C - since it was done by the same primitive as for
    >>> lists.  In R-3.1.0, this was changed - producing a massive slowdown -
    >>> for the purpose of giving a warning on partial matches even if the
    >>> user had not set the warnPartialMatchDollar option to TRUE.  In
    >>> R-3.1.1, this was changed to not warn unless warnPartialMatchDollar was
    >>> TRUE which was the PREVIOUS behaviour.  In other words, this change
    >>> reverted the change made in R-3.1.0.  But instead of simply deleting
    >>> the definition of $.data.frame, R-3.1.1 added extra code to it, the
    >>> only effect of which is to slightly change the wording of the warning
    >>> message from what is produced for any other list, while still retaining
    >>> the massive slowdown.
    >>> There is no need for you to write $.data.frame in C.  You just need
    >>> to delete the version written in R.
    >> 
    >> Sorry, I did misunderstand.  Thanks for the clarification.
    >> 
    >> But if the "You" in your last sentence meant me, it needs to be "They": I am not a member of R Core and can't make any changes to the sources.
    >> 
    >> Duncan Murdoch
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > -- 
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Office: A 4.23
    > Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel
#
On 04/02/2019 10:48 a.m., peter dalgaard wrote:
Since $.data.frame is in the base package, it is handled specially in 
src/library/base/R/zzz.R.  I'll email you a patch if it compiles and 
passes checks.

Duncan Murdoch
#
On 04/02/2019 11:34 a.m., Martin Maechler wrote:
I think Radford's point is that there is no difference between the 
behaviour of $ on a data.frame or a list (except for the wording of the 
warning message), so the $.list method (which is fast) is sufficient.

Duncan Murdoch