Skip to content

as.data.frame(do.call(rbind,lapply)) produces something weird

8 messages · Sam Steingold, arun, William Dunlap +2 more

#
The following code:
--8<---------------cut here---------------start------------->8---
a b.x b.y c.x c.y
1 a1   1   1   1   1
2 a2   2   4   8  64
3 a3   3   9  27 729
--8<---------------cut here---------------end--------------->8---
the appearance of z is good, but str() and summary betray some weirdness:
--8<---------------cut here---------------start------------->8---
'data.frame':	3 obs. of  5 variables:
 $ a  :List of 3
  ..$ : chr "a1"
  ..$ : chr "a2"
  ..$ : chr "a3"
 $ b.x:List of 3
  ..$ : int 1
  ..$ : int 2
  ..$ : int 3
 $ b.y:List of 3
  ..$ : int 1
  ..$ : int 4
  ..$ : int 9
 $ c.x:List of 3
  ..$ : int 1
  ..$ : int 8
  ..$ : int 27
 $ c.y:List of 3
  ..$ : int 1
  ..$ : int 64
  ..$ : int 729
--8<---------------cut here---------------end--------------->8---
how do I ensure that the columns of z are vectors, as in
--8<---------------cut here---------------start------------->8---
a b.x b.y c.x c.y
1 a1   1   1   1   1
2 a2   2   4   8  64
3 a3   3   9  27 729
'data.frame':	3 obs. of  5 variables:
 $ a  : Factor w/ 3 levels "a1","a2","a3": 1 2 3
 $ b.x: num  1 2 3
 $ b.y: num  1 4 9
 $ c.x: num  1 8 27
 $ c.y: num  1 64 729
--8<---------------cut here---------------end--------------->8---
thanks!
#
Hi,
May be this helps:
z1<-as.data.frame(do.call(rbind,lapply(1:3,function(x) c(a=paste("a",x,sep=""),unlist(do.call(c,list(b=myfun(x),c=myfun(x*x*x))))))))
z2<-within(z1,{b.x<-as.numeric(as.character(b.x));b.y<-as.numeric(as.character(b.y));c.x<-as.numeric(as.character(c.x));c.y<-as.numeric(as.character(c.y))})
?str(z2)
#'data.frame':??? 3 obs. of? 5 variables:
# $ a? : Factor w/ 3 levels "a1","a2","a3": 1 2 3
# $ b.x: num? 1 2 3
# $ b.y: num? 1 4 9
# $ c.x: num? 1 8 27
# $ c.y: num? 1 64 729


?z2
#?? a b.x b.y c.x c.y
#1 a1?? 1?? 1?? 1?? 1
#2 a2?? 2?? 4?? 8? 64
#3 a3?? 3?? 9? 27 729
A.K.



----- Original Message -----
From: Sam Steingold <sds at gnu.org>
To: r-help at r-project.org
Cc: 
Sent: Friday, November 9, 2012 2:21 PM
Subject: [R] as.data.frame(do.call(rbind,lapply)) produces something weird

The following code:
--8<---------------cut here---------------start------------->8---
?  a b.x b.y c.x c.y
1 a1?  1?  1?  1?  1
2 a2?  2?  4?  8? 64
3 a3?  3?  9? 27 729
--8<---------------cut here---------------end--------------->8---
the appearance of z is good, but str() and summary betray some weirdness:
--8<---------------cut here---------------start------------->8---
'data.frame':??? 3 obs. of? 5 variables:
$ a? :List of 3
? ..$ : chr "a1"
? ..$ : chr "a2"
? ..$ : chr "a3"
$ b.x:List of 3
? ..$ : int 1
? ..$ : int 2
? ..$ : int 3
$ b.y:List of 3
? ..$ : int 1
? ..$ : int 4
? ..$ : int 9
$ c.x:List of 3
? ..$ : int 1
? ..$ : int 8
? ..$ : int 27
$ c.y:List of 3
? ..$ : int 1
? ..$ : int 64
? ..$ : int 729
--8<---------------cut here---------------end--------------->8---
how do I ensure that the columns of z are vectors, as in
--8<---------------cut here---------------start------------->8---
?  a b.x b.y c.x c.y
1 a1?  1?  1?  1?  1
2 a2?  2?  4?  8? 64
3 a3?  3?  9? 27 729
'data.frame':??? 3 obs. of? 5 variables:
$ a? : Factor w/ 3 levels "a1","a2","a3": 1 2 3
$ b.x: num? 1 2 3
$ b.y: num? 1 4 9
$ c.x: num? 1 8 27
$ c.y: num? 1 64 729
--8<---------------cut here---------------end--------------->8---
thanks!
#
1. I don't want to have to list all the column names explicitly

2. I find the num->char->num conversion repugnant and unacceptable.
#
Your call to rbind() creates matrix of mode "list".  Thus every element
can be of a different type, although you "know" that there is a pattern
to the types.  E.g.,
  > R <- rbind(
          list(Letter="a", Integer=1L, Complex=1+1i),
          list(Letter="b", Integer=2L, Complex=2+2i))
  > str(R)
  List of 6
   $ : chr "a"
   $ : chr "b"
   $ : int 1
   $ : int 2
   $ : cplx 1+1i
   $ : cplx 2+2i
   - attr(*, "dim")= int [1:2] 2 3
   - attr(*, "dimnames")=List of 2
    ..$ : NULL
    ..$ : chr [1:3] "Letter" "Integer" "Complex"

data.frame(R), since R is a matrix, will make a data.frame containing
the columns of R.  It does not decide that since each column is a list
that it should what data.frame(list(...)) would do, it just sticks those
columns, as is, into the data.frame that it creates:

  > Rdf <- data.frame(R)
  > str(Rdf)
  'data.frame':   2 obs. of  3 variables:
   $ Letter :List of 2
    ..$ : chr "a"
    ..$ : chr "b"
   $ Integer:List of 2
    ..$ : int 1
    ..$ : int 2
   $ Complex:List of 2
    ..$ : cplx 1+1i
    ..$ : cplx 2+2i

You can convert those columns to their "natural" type, at least
the type of their first element, with

  > for(i in seq_along(Rdf)) Rdf[[i]] <- as(Rdf[[i]], class(Rdf[[i]][[1]]))
  > str(Rdf)
  'data.frame':   2 obs. of  3 variables:
   $ Letter : chr  "a" "b"
   $ Integer: int  1 2
   $ Complex: cplx  1+1i 2+2i

Note the as(list(...), atomicType) does the conversion if every element
of list(...) has length 1 and throws an error otherwise.  That is probably
a good check in this case.  unlist() would give the same result, perhaps
more quickly, if the list has the structure you expect but would silently
give bad results if some element of the list did not have length one.

Is that what you are looking for?

Note that storing things in a list takes a lot more memory  than storing
them as atomic vectors so your technique may not scale up very well.
  > object.size(as.list(1:1e6)) / object.size(1:1e6)
  13.9998700013 bytes

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
HI,
If you don't want to list the column names explicitly,
you can try this:
?z1<-as.data.frame(do.call(rbind,lapply(1:3,function(x) c(a=paste("a",x,sep=""),unlist(do.call(c,list(b=myfun(x),c=myfun(x*x*x))))))))
z2<-z1[,-1]
?z2[]<-sapply(z2,function(x) as.numeric(as.character(x))) 
data.frame(a=z1[,1],z2)
#?? a b.x b.y c.x c.y
#1 a1?? 1?? 1?? 1?? 1
#2 a2?? 2?? 4?? 8? 64
#3 a3?? 3?? 9? 27 729
?str(data.frame(a=z1[,1],z2))
#'data.frame':??? 3 obs. of? 5 variables:
# $ a? : Factor w/ 3 levels "a1","a2","a3": 1 2 3
# $ b.x: num? 1 2 3
# $ b.y: num? 1 4 9
# $ c.x: num? 1 8 27
# $ c.y: num? 1 64 729

A.K.

----- Original Message -----
From: Sam Steingold <sds at gnu.org>
To: r-help at r-project.org; arun <smartpink111 at yahoo.com>
Cc: 
Sent: Friday, November 9, 2012 3:00 PM
Subject: Re: as.data.frame(do.call(rbind, lapply)) produces something weird
1. I don't want to have to list all the column names explicitly

2. I find the num->char->num conversion repugnant and unacceptable.
#
This is a case where you start by breaking down you complex one-liner
into separate statements and examine what the results are at each
point.  This is what I would have to do with the script you posted.

I think as Bill pointed out, one of your function calls is probably
creating a result that you were not expecting.  Standard (good)
programming practice would have you creating a number of simple
statements; this allows ease of debugging.
On Fri, Nov 9, 2012 at 2:21 PM, Sam Steingold <sds at gnu.org> wrote:

  
    
#
On Nov 9, 2012, at 11:21 AM, Sam Steingold wrote:

            
Winsemius' Corollary:

Kurt G?del's Incomprehension Theorem (one of his less well-known results)  has as one of its corollaries that writing `rbind` or `cbind` inside `data.frame` or `as.data.frame`  has probability of measure 1 of producing a result that will dissatisfy its author.
#
Note that the column-wise conversion I suggested might be better
done on the matrix R before conversion to a data.frame.  E.g.
list(Letter="a", Integer=1L, Complex=1+1i),
           list(Letter="b", Integer=2L, Complex=2+2i))
'data.frame':   2 obs. of  3 variables:
 $ Letter : Factor w/ 2 levels "a","b": 1 2
 $ Integer: int  1 2
 $ Complex: cplx  1+1i 2+2i

In any case, a long list will use a lot of memory.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com