Skip to content

Strange output daply with empty strata

4 messages · Dennis Murphy, Hadley Wickham, Jan van der Laan

#
Dear list,

I get some strange results with daply from the plyr package. In the  
example below, the average age per municipality for employed en  
unemployed is calculated. If I do this using tapply (see code below) I  
get the following result:

         no      yes
A       NA 36.94931
B 51.22505 34.24887
C 48.05759 51.00198

If I do this using daply:

municipality       no      yes
            A 36.94931 48.05759
            B 51.22505 51.00198
            C 34.24887       NA

daply generates the same numbers. However, these are not in the  
correct cells. For example, in municipality A everybody is employed.  
Therefore, the NA should be in the cell for unemployed in municipality  
A.

Am I using daply incorrectly or is there indeed something wrong with  
the output of daply?

Regards,

Jan


I am using version 1.1 of the plyr-package.


# Generate some test data
data.test <- data.frame(
   municipality=rep(LETTERS[1:3], each=10),
   employed=sample(c("yes", "no"), 30, replace=TRUE),
   age=runif(30,20,70))
# Make sure everybody is employed in municipality A
data.test$employed[data.test$municipality == "A"] <- "yes"

# Compare the output of tapply:
tapply(data.test$age, list(data.test$municipality, data.test$employed),
mean)
# to that of daply:
daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
# results of ddply are the samen as tapply
ddply(data.test, .(municipality, employed), function(d){mean(d$age)} )
#
This is a bug, which I've fixed in the development version (hopefully
to be released next week).
In the plyr 1.2:
employed
municipality       no      yes
           A       NA 39.49980
           B 44.69291 51.63733
           C 57.38072 45.28978

Hadley
#
OK, thank you both for your answers. I'll wait for the next version.

Regards,
Jan