Skip to content
Prev 257946 / 398506 Next

Problem with ddply in the plyr-package: surprising output of a date-column

On 4/25/2011 10:19 AM, Christoph J?ckel wrote:
Works for me:

 > df[c(2:3,6:7),]
   ID1 ID2 ID3      Date Value
2   2   b  v1 1985-05-2     2
3   2   b  v1 1985-05-3     3
6   4   e  v1 1985-05-6     6
7   4   e  v1 1985-05-7     7
 > ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
   ID1 ID2 ID3      Date Value
1   2   b  v1 1985-05-2     2
2   2   b  v1 1985-05-3     3
3   4   e  v1 1985-05-6     6
4   4   e  v1 1985-05-7     7
 > sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] plyr_1.5.2

loaded via a namespace (and not attached):
[1] tools_2.13.0

A couple of things: there was just an update of plyr to 1.5.2; maybe 
that fixes what you are seeing?  Also, your df consists of only factors. 
  cbind-ing the data before turning it into a data.frame makes it a 
character matrix which gets converted to factors.

 > str(df)
'data.frame':   7 obs. of  5 variables:
  $ ID1  : Factor w/ 4 levels "1","2","3","4": 1 2 2 3 3 4 4
  $ ID2  : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5
  $ ID3  : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
  $ Date : Factor w/ 7 levels "1985-05-1","1985-05-2",..: 1 2 3 4 5 6 7
  $ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7

Maybe that has something to do with the odd "dates" since they are not 
really dates at all, just string representations of factor levels. 
Compare with:

DF <- data.frame(ID1=c(1,2,2,3,3,4,4),
	ID2=c('a','b','b','c','d','e','e'),
	ID3=c("v1","v1","v1","v1","v2","v1","v1"),
	Date=as.Date(c("1985-05-1","1985-05-2","1985-05-3",
		"1985-05-4","1985-05-5","1985-05-6","1985-05-7")),
	Value=c(1,2,3,4,5,6,7))
str(DF)
#'data.frame':   7 obs. of  5 variables:
# $ ID1  : num  1 2 2 3 3 4 4
# $ ID2  : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5
# $ ID3  : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
# $ Date : Date, format: "1985-05-01" "1985-05-02" ...
# $ Value: num  1 2 3 4 5 6 7

This version also works for me.

ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
#  ID1 ID2 ID3       Date Value
#1   2   b  v1 1985-05-02     2
#2   2   b  v1 1985-05-03     3
#3   4   e  v1 1985-05-06     6
#4   4   e  v1 1985-05-07     7