Skip to content

Unexpected behaviour as.data.frame

11 messages · Jan van der Laan, Bert Gunter, Santosh Srinivas +1 more

#
I use the following code to create two data.frames d1 and d2 from a list:

types  <- c("integer", "character", "double")
nlines <- 10
d1     <- as.data.frame(lapply(types, do.call, list(nlines)),  
stringsAsFactor=FALSE)
l2     <- lapply(types, do.call, list(nlines))
d2     <- as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second  
column is a factor while in d2 it is a character (which I would expect):
'data.frame':	10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c........................................: Factor w/ 1 level "": 1  
1 1 1 1 1 1 1 1 1
  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
'data.frame':	10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c........................................: chr  "" "" "" "" ...
  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create  
an 'empty' data.frame with specified column types and dimensions. I  
need this data.frame to pass on to my c++ routines. Is there a more  
simple/elegant way of creating this data.frame?

Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
#
Forget I asked. There was a typo in my example (stringsAsFactor  
instead of stringAsFactors) which explained the difference. My  
apologies.

My second question however still stands: How does on create a  
data.frame with given column types and given dimensions? Thanks.

Regards,
Jan


Quoting Jan van der Laan <rhelp at eoos.dds.nl>:
#
In your post, you're missing the final "s" on the stringsAsFactors
argument in the d1 assignment. When I typed it correctly, it works as
expected.

-- Bert
On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:

  
    
#
Thanks. I also noticed myself minutes after sending my message to the 
list. My 'please ignore my question it was just a stupid typo' message 
was sent with the wrong account and is now awaiting moderation.

However, my other question still stands: what is the 
preferred/fastest/simplest way to create a data.fame with given column 
types and dimensions?

Regards,
Jan
On 05/15/2011 04:43 PM, Bert Gunter wrote:
#
Inline below.
On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
I do not know, but  why is simply

data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE)

not acceptable? Note that if you had, say, 500, numeric (= double) and
100 character columns to add, you might do something like:
While this might save some typing, it may not be much more efficient
than typing it all out -- maybe just some parsing time is saved. You
can experiment and see.

However, since a data.frame **is** a list with added attributes and a
great deal of the work of the constructor is in constructing and
checking these attributes (e.g. row and column names), I see nothing
terribly inefficient with what you did. It's just a bit obscure.  But
maybe someone with greater expertise will set us both straight.

Cheers,
Bert

  
    
#
I feel like I'm always asking this type of questions, but is it possible 
to add a base function that allows creating an empty data.frame, as 
matrix() does?

What I mean would be something like:
create.data.frame(number_of_columns, mode_of_columns).
I think it would make things easier than creating one or several 
matrices and then combining them

Is it possible; does it make sense?

Ivan

Le 5/15/2011 22:17, Bert Gunter a ?crit :

  
    
#
Hi Ivan, Take a look dataFrame in R.utils ... is that what you want?

from the help file:

Examples

  df <- dataFrame(colClasses=c(a="integer", b="double"), nrow=10)
  df[,1] <- sample(1:nrow(df))
  df[,2] <- rnorm(nrow(df))
  print(df)

Thanks,
Santosh

On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
#
Thanks Santosh!
The more I learn about R.utils, the more I think that many of its 
functions should be included in the base distribution.
Ivan

Le 5/16/2011 10:42, Santosh Srinivas a ?crit :

  
    
#
Actually, what would be even better would be an extra argument to 
specify the column names.
I don't think it's very difficult to implement and it would make things 
even easier.
Ivan

Le 5/16/2011 11:25, Ivan Calandra a ?crit :

  
    
#
Forget this last email, I oversaw the implementation in the examples...
Ivan


Le 5/16/2011 11:35, Ivan Calandra a ?crit :

  
    
#
Santosh, Ivan,

This is also what I was looking for. Thanks. Looking at the source of 
dataFrame.default is seems that it uses the same approach as I did: 
first create a list then a data.frame from that list. I think I'll stick 
with the code I already had as I don't want another dependency (multiple 
actually for R.utils). But thanks again for pointing it out.

Jan
On 05/16/2011 10:42 AM, Santosh Srinivas wrote: