Skip to content

Quiz: How to get a "named column" from a data frame

14 messages · Martin Maechler, Rui Barradas, Bert Gunter +7 more

#
Today, I was looking for an elegant (and efficient) way
to get a named (atomic) vector by selecting one column of a data frame.
Of course, the vector names must be the rownames of the data frame.

Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was
wondering if there are obvious better ones, and
also if this should not become more idiomatic (hence "R-devel"):

Consider this toy example, where the dataframe already has only
one column :
a   d   e 
  1  17 101
VAR
a   1
d  17
e 101

Now how, can I get 'nv' back from 'df' ?   I.e., how to get
[1] TRUE

where ...... only uses 'df' (and no non-standard R packages)?

As said, I know a simple solution (*), but I'm sure it is not
obvious to most R users and probably not even to the majority of
R-devel readers... OTOH, people like Bill Dunlap will not take
long to provide it or a better one.

(*) In my solution, the above '.......' consists of 17 letters.
I'll post it later today (CEST time) ... or confirm
that someone else has done so.

Martin
#
I don't know if this is better, but it's the most obvious/shortest I
could come up with.  Transpose the data.frame column to a 'row' vector
and drop the dimensions.

R> identical(nv, drop(t(df)))
[1] TRUE

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
#
Hello,

A bit more general

nv <- c(a=1, d=17, e=101); nv
nv2 <- c(a="a", d="d", e="e")
df2 <- data.frame(VAR = nv, CHAR = nv2); df2

identical( nv, drop(t( df2[1] )) )   # TRUE
identical( nv, drop(t( df2[[1]] )) ) # FALSE

Rui Barradas

Em 18-08-2012 16:16, Joshua Ulrich escreveu:
#
Or to expand just a hair on Joshua's suggestion, is the following what you want:
a  b  c  d  e  f  g  h  i  j
 1  2  3  4  5  6  7  8  9 10
x y
a  1 A
b  2 B
c  3 C
d  4 D
e  5 E
f  6 F
g  7 G
h  8 H
i  9 I
j 10 J
a  b  c  d  e  f  g  h  i  j
 1  2  3  4  5  6  7  8  9 10
[1] TRUE

Cheers,
Bert
On Sat, Aug 18, 2012 at 8:16 AM, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:

  
    
#
> I don't know if this is better, but it's the most obvious/shortest I
    > could come up with.  Transpose the data.frame column to a 'row' vector
    > and drop the dimensions.

    R> identical(nv, drop(t(df)))
    > [1] TRUE

Yes, that's definitely shorter,
congratulations!

One gotta is that I'd want a solution that also works when the
df has more columns than just one...

Your idea to use  t(.) is nice and "perfect" insofar as it
coerces the data frame to a matrix, and that's really the clue:

Where as  df[,1]  is losing the names,  
the matrix indexing is not.
So your solution can be changed into

     t(df)[1,]

which is even shorter...
and slightly less efficient, at least conceptually, than mine, which has
been

   as.matrix(df)[,1]

Now, the remaining question is:  Shouldn't there be something
more natural to achieve that?
(There is not, currently, AFAIK).

Martin


    > Best,
    > --
    > Joshua Ulrich  |  about.me/joshuaulrich
    > FOSS Trading  |  www.fosstrading.com


    > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
>> Today, I was looking for an elegant (and efficient) way to get a named
    >> (atomic) vector by selecting one column of a data frame.  Of course,
    >> the vector names must be the rownames of the data frame.
    >> 
    >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was
    >> wondering if there are obvious better ones, and also if this should
    >> not become more idiomatic (hence "R-devel"):
    >> 
    >> Consider this toy example, where the dataframe already has only one
    >> column :
    >> 
    >>> nv <- c(a=1, d=17, e=101); nv
    >> a   d   e
    >> 1  17 101
    >> 
    >>> df <- as.data.frame(cbind(VAR = nv)); df
    >> VAR
    >> a   1
    >> d  17
    >> e 101
    >> 
    >> Now how, can I get 'nv' back from 'df' ?   I.e., how to get
    >> 
    >>> identical(nv, .......)
    >> [1] TRUE
    >> 
    >> where ...... only uses 'df' (and no non-standard R packages)?
    >> 
    >> As said, I know a simple solution (*), but I'm sure it is not
    >> obvious to most R users and probably not even to the majority of
    >> R-devel readers... OTOH, people like Bill Dunlap will not take
    >> long to provide it or a better one.
    >> 
    >> (*) In my solution, the above '.......' consists of 17 letters.
    >> I'll post it later today (CEST time) ... or confirm
    >> that someone else has done so.
    >> 
    >> Martin
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
#
Yes, but either

drop(t(df[,1,drop=TRUE]))

or

t(df[,1,drop=TRUE])[1,]

does work. My minimal effort to check timings found that the first
version was a hair faster.

-- Bert
On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

  
    
#
Sorry! -- Change that to drop = FALSE  !

 drop(t(df[,1,drop=FALSE]))
 t(df[,1,drop=FALSE])[1,]

-- Bert
On Sat, Aug 18, 2012 at 9:37 AM, Bert Gunter <bgunter at gene.com> wrote:

  
    
#
On Sat, Aug 18, 2012 at 9:33 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
Perhaps a data frame method for as.vector?

as.vector.data.frame <- function(x, ...) as.matrix(x)[,1]
as.vector(df[1])

or an additional argument to `[.data.frame` like keep.names, which
defaults to FALSE to maintain current behavior but can optionally be
TRUE.

Cheers,

Josh

  
    
#
This isn't super-concise, but has the virtue of being clear:

nv <- c(a=1, d=17, e=101)
df <- as.data.frame(cbind(VAR = nv))

identical(nv, setNames(df$VAR, rownames(df)))
# TRUE


It seems to be more efficient than the other methods as well:

f1 <- function() setNames(df$VAR, rownames(df))
f2 <- function() t(df)[1,]
f3 <- function() as.matrix(df)[,1]

r <- microbenchmark(f1(), f2(), f3(), times=1000)
r
# Unit: microseconds
#   expr    min      lq median      uq      max
# 1 f1() 14.589 17.0315 18.608 19.3220   89.388
# 2 f2() 68.057 70.8735 72.240 75.8065 3707.012
# 3 f3() 58.153 61.2600 62.521 65.0380  238.483

-Winston



On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
#
On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
But aren't you making life difficult for yourself by not using I ?

df <- data.frame(VAR = I(nv))
str(df[[1]])

(which isn't quite identically because it now has the AsIs class)

Hadley
#
On 2012-08-18 11:03, Martin Maechler wrote:
For this purpose my private function library has a function withnames():

withnames(): Extract from data frame as a named vector

Description: Extracts data from a data frame; if the result is a vector
(i.e. we extracted a single column and did not specify 'drop=FALSE')
it is assigned names derived from the row names of the data frame.

Usage: withnames(expr)

Arguments: expr: R expression.

Details: 'expr' is evaluated in an environment in which the extractor
functions '$.data.frame', '[.data.frame', and '[[.data.frame' are
replaced by versions that attach the data frame's row names to an
extracted vector.

Value: 'expr', evaluated as described above.

## Code

withnames<-function(expr) {
   eval(substitute(expr),
   list(
     `[.data.frame` = function(x,i,...) {
       out<-x[i,...]
       if (is.null(dim(out))) names(out)<-row.names(x)[i]
       return(out)},
     `[[.data.frame` = function(x,...) {
       out<-x[[...]]
       if (is.null(dim(out))) names(out)<-row.names(x)
       return(out)},
     `$.data.frame` = function(x,name) {
       out<-x[[name, exact=FALSE]]
       if (is.null(dim(out))) names(out)<-row.names(x)
       return(out)}
     ),
   enclos=parent.frame())
}

## Examples

dd <- data.frame(aa=1:6, bb=letters[c(1,3,2,3,3,1)],
   row.names=LETTERS[1:6])
dd
dd$aa                          # Unnamed vector
withnames(dd$aa)               # Named vector
withnames(dd[["aa"]])          # Named vector
withnames(dd[2:4,"aa"])        # Named vector
withnames(dd$bb)               # Factor with names
withnames(outer(dd$a,dd$a))    # Both dimensions have names

## But now I am looking for a version that will play nicely with with():

withnames(with(dd, aa))  # No names!
with(dd, withnames(aa))  # No names!
#
That would have been essentially my suggestion as well.  I prefer its clarity
(and speed).  I didn't know if you wanted your solution to also apply
to matrices embedded in data.frames.  In S+ rownames<-() works on vectors
(because it calls the generic rowId<-()) so the following works:
  > f4 <- function(df, column) { tmp <- df[[column]] ; rownames(tmp) <- rownames(df) ; tmp}
  > nv <- c(a=1,d=17,e=101)
  > df <- data.frame(VAR=nv, Two=3^(1:3))
  > f4(df, 2)
   a d  e 
   3 9 27
  > df$Matrix <- matrix(1001:1006, ncol=2, nrow=3)
  > f4(df, "Matrix")
    [,1] [,2] 
  a 1001 1004
  d 1002 1005
  e 1003 1006

I forget if R has something like rowIds() (it is to names and rownames as
NROW is to length and nrow).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
2 days later
#
On 12-08-18 12:33 PM, Martin Maechler wrote:
I've been offline, so I'm a bit late to this game, but the examples 
above fail when df contains a character column as well as the desired 
one, because everything gets coerced to a character matrix.  You need to 
select the column first, then convert to a matrix, e.g.

drop(t(df[,1,drop=FALSE]))

Duncan Murdoch
#
On Tue, Aug 21, 2012 at 2:34 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
That's true, but I was assuming a one-column data.frame, which can be
achieved via:
df <- data.frame(VAR=nv,CHAR=letters[1:3],stringsAsFactors=FALSE)
drop(t(df[1]))

That said, I prefer the setNames() solution for its efficiency.

Best,
Josh