Skip to content

Quiz: How to get a "named column" from a data frame

3 messages · Martin Maechler, Christian Brechbühler, Andrew Piskorski

#
On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechb?hler .... wrote:
But it is not a solution in a current version of R!
though it's still interesting that   df[,1]  worked in some incantation of R.

What's your sessionInfo()?
Martin
#
On 8/18/12, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
My mistake!  We disliked some quirks of indexing, so we've long had
our own patch for "[.data.frame" in place, which I used inadvertently.
 In essence, it does this:

    result <- base::"[.data.frame"(df,,1, drop=F)
    if (drop && length(ncol(result) > 0) && ncol(result)==1) {
          save.names <- dimnames(result)[[1]]
          result <- result[[1]]
          names(result) <- save.names
    }

That obviously violated your constraint "no non-standard R packages".
I apologize.

Still, maybe the behavior of getting the named column would be
desirable in general?

/Christian
1 day later
#
On Sat, Aug 18, 2012 at 02:13:20PM -0400, Christian Brechb?hler wrote:
As I understand it, when when doing 'df[,1]' on a data frame, Bell
Labs S and all versions of S-Plus prior to 3.4 always retained the
data frame's row names as the names on the result vector.  'df[,1]'
gave you a named vector identical to your 'nv' above.  Then in 1996
with S-Plus 3.4, Insightful broke that behavior, after which 'df[,1]'
returned a vector without any names.  I believe R copied that
late-1990s S-Plus behavior, but I don't know why exactly.

When subscripting objects, R sometimes retains the object's dimnames
as names in the result, and sometimes not, which I find frustrating.
Personally, I think it would make much more sense if subscripting
ALWAYS retained any names it could, and worked as similarly as
possible across data frames, matrices, arrays, vectors, etc.  After
all, explicitly dropping names afterwards is trivial, while adding
them back on is not.

Back on 2005-10-19 with R 2.2.0, I gave a simple test of 15 cases; 4
of them dropped names during subscripting, the other 11 preseved them.
That's towards the end of the discussion here:

  https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=8192

Contrary to the initial tone of my old 2005 "bug" report, current R
subscripting behavior is of course NOT a bug, as AFAIK it's working as
the R Core Team intended.  However, I definitely consider the current
behavior a design infelicity.

Just now on stock R 2.15.1 (with --vanilla), I ran an updated version
of those same simple tests.  Of 22 subscripting test cases, 7 lose
names and 15 preserve them.  (If anyone's interested in the specific
tests, I can send them, or try to append them to that old 8192 feature
request.)

For what it's worth, at work, for years we ran various versions of
pre-namespace R using some ugly patches of "[" and "[.data.frame" to
force name retention during subscripting.  Since we were not using
namespaces at all, those "keep names" subscripting hacks were
affecting ALL R code we ran, not just our own custom code which needed
and expected the names to be retained.  Yet perhaps surprisingly, I
don't think I ever ran into a single case where the forced retention
of names broke any code.  We of course ran only a tiny sample of the
huge amount of code on CRAN, but that experience suggests that most R
code which expects un-named objects doesn't mind at all if names are
present.

If anyone would genuinely like to add an option for name-preserving
subscripting to R, I'm willing to work on it, so please do let me know
your thoughts.  So far though, I've never dug into the guts of the
.Primitive("[") and "[.data.frame" functions to see how/why they
sometimes keep and sometime discard names during subscripting.