Skip to content

Bug in "transform"?

13 messages · Gabor Grothendieck, Brian Ripley, Peter Dalgaard +3 more

#
Dear useRs,

Here is a weird behavior of transform function:

   mtcars1<-matcars
   transform(mtcars1,t1=3,t2=4)
Error in data.frame(`_data`, e[!matched]) :
   arguments imply differing number of rows: 32, 1

instead, this works:

   mtcars1$t1<-0
   transform(mtcars1,t1=3,t2=4)

also works if applied in turn:

   transform(mtcars1,t1=3)
   transform(mtcars1,t2=4)

I often need to use this for creating new variables in data frame from  
those already present.
Sorely needed!!

Best,
Vitalie.
#
Try:

cbind(mtcars, t1 = 3, t2 = 4)
On Tue, Dec 2, 2008 at 11:14 AM, Vitalie Spinu <vitosmail at rambler.ru> wrote:
#
As the help page says

      If some of the values are not vectors of the appropriate length,
      you deserve whatever you get!

So you can use

mtcars1 <- mtcars
mtcars1[c("t1", "t2")] <- cbind(rep(3,32), rep(4, 32))

or even

mtcars1 <- transform(mtcars, t1=rep(3, 32), t2=rep(4, 32))
Vitalie Spinu wrote:
'works'?  Only if you assign the result.
Just learn to use indexing: transform() is just syntactic sugar that you 
are not making use of.

  
    
#
On Tue, 02 Dec 2008 17:37:44 +0100, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:

            
Ok..I got it, it is an usual pain with R: vectors with length 1 are  
recycled and data.frames with nrows=1 and arrays with dim[1]=1 are not.

Will have to use

mtcars[c("t1","t2")]<-with(mtcars, cbind(t1=..., t2=...))

or rewrite transform.data.frame  myself.

Thanks a lot,

Vitalie.
#
Prof Brian Ripley wrote:
Yes (did I write that?). It is a bit annoying with things that almost 
work, though.


[snip]
...at least when you're not making use of the scoping aspects. And if 
you calculate at least one vector of full length, then the issue goes away.



 > transform(aq, a=1,b=2)
Error in data.frame(`_data`, e[!matched]) :
   arguments imply differing number of rows: 6, 1
 > transform(aq, a=1,b=2,o=Ozone)
     Ozone Solar.R Wind Temp Month Day a b  o
3      12     149 12.6   74     5   3 1 2 12
31     37     279  7.4   76     5  31 1 2 37
34     NA     242 16.1   67     6   3 1 2 NA
65     NA     101 10.9   84     7   4 1 2 NA
59     NA      98 11.5   80     6  28 1 2 NA
133    24     259  9.7   73     9  10 1 2 24



The underlying issue is actually not in transform() but in data.frame():

 > aq <- airquality[sample(1:153,6),]
 > data.frame(aq, list(a=1,b=2))
Error in data.frame(aq, list(a = 1, b = 2)) :
   arguments imply differing number of rows: 6, 1
 > data.frame(aq, list(a=1))
     Ozone Solar.R Wind Temp Month Day a
3      12     149 12.6   74     5   3 1
31     37     279  7.4   76     5  31 1
34     NA     242 16.1   67     6   3 1
65     NA     101 10.9   84     7   4 1
59     NA      98 11.5   80     6  28 1
133    24     259  9.7   73     9  10 1
#
Is this a bug or a "feature"?

Hadley
#
On Tue, 2 Dec 2008, hadley wickham wrote:

            
As documented:

   Objects passed to data.frame should have the same number of rows, but
   atomic vectors, factors and character vectors protected by I will be
   recycled a whole number of times if necessary.

How did you manage to miss that in the help page?

  
    
#
On Tue, 2 Dec 2008, Peter Dalgaard wrote:

            
Well, no, it is in the way that you call data.frame().  If you want to add 
several variables, pass them as separate arguments rather than as a list 
(just as they were passed to transform.data.frame).  That's a simple 
change and will make transform.data.frame behave more consistently with 
cbind.data.frame and data.frame.
#
Prof Brian Ripley wrote:
Hmm, you could well be right there. Not quite a simple spot change, 
though. As far as I see, either it needs do.call, or maybe there is a 
much more radical simplification possible. I'll have a look.

BTW, we have a deparser bug:

 > transform
function ("_data", ...)
UseMethod("transform")
<environment: namespace:base>

 > function ("_data", ...)
Error: unexpected string constant in "function ("_data""
....

 > f <- function (`_data`, ...) {}
 > attr(f,"source")<-NULL
 > f
function ("_data", ...)
{
}

It should deparse with backticks, not the old-style quotes (did that 
ever work?).
#
Peter Dalgaard wrote:
isn't it the same issue as in this simple case:

`FALSE` = 0
ls()
# "FALSE", not `FALSE`

vQ
#
Wacek Kusnierczyk wrote:

            
No. ls() always returns a character vector.
#
Many thanks for your kind responses.
Related to above, I find rather inconsistent following behavior:
Error in data.frame(aq, list(a = 1, b = 2)) :
   arguments imply differing number of rows: 6, 1
Error in data.frame(..., check.names = FALSE) :
   arguments imply differing number of rows: 6, 1

but,

aq[c("a","b")]<-list(1,2) #works fine

In my understanding all versions above are conceptually similar and should  
behave in a same way, and recycling for one row data.frames should be a  
default. R is an interactive language and behavior like above is a real  
pain.
I really try to use indexing in code all the time I possibly can. But for  
interactive use with dozens of data transformations and reshapings per day  
- with just indexing I  would probably see stars at the end of the day.  
Thanks for existence of such "syntactic sugars" and for packages like   
Hadley's reshape and plyr.

Regards,
Vitalie.
#
On Wed, Dec 3, 2008 at 2:06 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
Because it's not true?

# These work:
data.frame(data.frame(1:10), data.frame(1))
data.frame(data.frame(1:10), data.frame(1), data.frame(5))
data.frame(data.frame(1:10, 1), data.frame(5))

# This doesn't
data.frame(data.frame(1:10), data.frame(1, 5))

Clearly there are situations in which data frames _are_ recycled.

Hadley