Hi Together,
I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?
df <- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-05-5","1985-05-6","1985-05-7"),
Value=c(1,2,3,4,5,6,7)))
df[,1] <- as.character(df[,1])
df[,2] <- as.character(df[,2])
df$Date <- strptime(df$Date,"%Y-%m-%d")
#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]
#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng] <- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
Thanks in advance,
Christoph
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Christoph J?ckel (Dipl.-Kfm.)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Research Assistant
Chair for Financial Management and Capital Markets | Lehrstuhls f?r
Finanzmanagement und Kapitalm?rkte
TUM School of Management | Technische Universit?t M?nchen
Arcisstr. 21 | D-80333 M?nchen | Germany
Problem with ddply in the plyr-package: surprising output of a date-column
9 messages · Christoph Jäckel, Peter Ehlers, William Dunlap +2 more
On 4/25/2011 10:19 AM, Christoph J?ckel wrote:
Hi Together,
I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?
df<- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-05-5","1985-05-6","1985-05-7"),
Value=c(1,2,3,4,5,6,7)))
df[,1]<- as.character(df[,1])
df[,2]<- as.character(df[,2])
df$Date<- strptime(df$Date,"%Y-%m-%d")
#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]
#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng]<- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
Works for me:
> df[c(2:3,6:7),]
ID1 ID2 ID3 Date Value
2 2 b v1 1985-05-2 2
3 2 b v1 1985-05-3 3
6 4 e v1 1985-05-6 6
7 4 e v1 1985-05-7 7
> ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
ID1 ID2 ID3 Date Value
1 2 b v1 1985-05-2 2
2 2 b v1 1985-05-3 3
3 4 e v1 1985-05-6 6
4 4 e v1 1985-05-7 7
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.5.2
loaded via a namespace (and not attached):
[1] tools_2.13.0
A couple of things: there was just an update of plyr to 1.5.2; maybe
that fixes what you are seeing? Also, your df consists of only factors.
cbind-ing the data before turning it into a data.frame makes it a
character matrix which gets converted to factors.
> str(df)
'data.frame': 7 obs. of 5 variables:
$ ID1 : Factor w/ 4 levels "1","2","3","4": 1 2 2 3 3 4 4
$ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5
$ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
$ Date : Factor w/ 7 levels "1985-05-1","1985-05-2",..: 1 2 3 4 5 6 7
$ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7
Maybe that has something to do with the odd "dates" since they are not
really dates at all, just string representations of factor levels.
Compare with:
DF <- data.frame(ID1=c(1,2,2,3,3,4,4),
ID2=c('a','b','b','c','d','e','e'),
ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=as.Date(c("1985-05-1","1985-05-2","1985-05-3",
"1985-05-4","1985-05-5","1985-05-6","1985-05-7")),
Value=c(1,2,3,4,5,6,7))
str(DF)
#'data.frame': 7 obs. of 5 variables:
# $ ID1 : num 1 2 2 3 3 4 4
# $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5
# $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
# $ Date : Date, format: "1985-05-01" "1985-05-02" ...
# $ Value: num 1 2 3 4 5 6 7
This version also works for me.
ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
# ID1 ID2 ID3 Date Value
#1 2 b v1 1985-05-02 2
#2 2 b v1 1985-05-03 3
#3 4 e v1 1985-05-06 6
#4 4 e v1 1985-05-07 7
Thanks in advance, Christoph -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Christoph J?ckel (Dipl.-Kfm.) -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls f?r Finanzmanagement und Kapitalm?rkte TUM School of Management | Technische Universit?t M?nchen Arcisstr. 21 | D-80333 M?nchen | Germany
Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University
On 2011-04-25 10:19, Christoph J?ckel wrote:
Hi Together,
I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?
df<- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-05-5","1985-05-6","1985-05-7"),
Value=c(1,2,3,4,5,6,7)))
df[,1]<- as.character(df[,1])
df[,2]<- as.character(df[,2])
df$Date<- strptime(df$Date,"%Y-%m-%d")
#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]
#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng]<- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
I would characterize your problem as:
a) using strptime - this is what gives ddply() fits;
b) not using str() to check whether R agrees with
you with respect to your data;
c) using cbind() inside data.frame(). This isn't
wrong, but is rarely (in my experience) useful.
If you use as.Date (or even nothing) on your Date
variable, you'll find that ddply does what you want.
To see why it doesn't work with strptime, check
str(df) and then ?Posixlt. You've converted Date
values to lists.
My comment about cbind() is to warn you that your
Values variable, as you have constructed it, is
a factor.
Peter Ehlers
Thanks in advance, Christoph -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Christoph J?ckel (Dipl.-Kfm.) -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls f?r Finanzmanagement und Kapitalm?rkte TUM School of Management | Technische Universit?t M?nchen Arcisstr. 21 | D-80333 M?nchen | Germany
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Brian Diggs Sent: Monday, April 25, 2011 11:05 AM To: christoph.jaeckel at wi.tum.de Cc: r-help at r-project.org Subject: Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column On 4/25/2011 10:19 AM, Christoph J?ckel wrote:
Hi Together, I have a problem with the plyr package - more precisely
with the ddply
function - and would be very grateful for any help. I hope
the example
here is precise enough for someone to identify the problem.
Basically,
in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference)
is handled
correctly. Any idea what I do wrong? df<-
data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d
','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-0
5-5","1985-05-6","1985-05-7"),
Value=c(1,2,3,4,5,6,7))) df[,1]<- as.character(df[,1]) df[,2]<- as.character(df[,2]) df$Date<- strptime(df$Date,"%Y-%m-%d") #Apparently there are two observation that have the same
IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the
desired output is:
df[c(2:3,6:7),] #My idea: Write a custom function that only returns
observations with
multiple rows. #Seems to work except that the Date column doesn't make any
sense anymore
#Warning message: In output[[var]][rng]<- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df)
if(nrow(df)<=1){NULL}else{df})
Works for me:
> df[c(2:3,6:7),]
ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-2 2 3 2 b v1 1985-05-3 3 6 4 e v1 1985-05-6 6 7 4 e v1 1985-05-7 7
> ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
ID1 ID2 ID3 Date Value 1 2 b v1 1985-05-2 2 2 2 b v1 1985-05-3 3 3 4 e v1 1985-05-6 6 4 4 e v1 1985-05-7 7 [ ... version info elided ... ] A couple of things: there was just an update of plyr to 1.5.2; maybe that fixes what you are seeing? Also, your df consists of only factors. cbind-ing the data before turning it into a data.frame makes it a character matrix which gets converted to factors.
> str(df)
'data.frame': 7 obs. of 5 variables: $ ID1 : Factor w/ 4 levels "1","2","3","4": 1 2 2 3 3 4 4 $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5 $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1 $ Date : Factor w/ 7 levels "1985-05-1","1985-05-2",..: 1 2 3 4 5 6 7 $ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7
The OP's data.frame contained a POSIXlt (not factor) object
in the "Date" column
> str(df)
'data.frame': 7 obs. of 5 variables:
$ ID1 : chr "1" "2" "2" "3" ...
$ ID2 : chr "a" "b" "b" "c" ...
$ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
$ Date : POSIXlt, format: "1985-05-01" "1985-05-02" ...
$ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7
and apparently plyr's equivalent of rbind doesn't support that class.
If you want to continue using POSIXlt objects you can get your
immediate result without ddply; subscripting will do the job:
> nDups <- with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length))
> print(nDups)
[1] 1 2 2 1 1 2 2
> df[nDups>1, ]
ID1 ID2 ID3 Date Value
2 2 b v1 1985-05-02 2
3 2 b v1 1985-05-03 3
6 4 e v1 1985-05-06 6
7 4 e v1 1985-05-07 7
> str(.Last.value)
'data.frame': 4 obs. of 5 variables:
$ ID1 : chr "2" "2" "4" "4"
$ ID2 : chr "b" "b" "e" "e"
$ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1
$ Date : POSIXlt, format: "1985-05-02" "1985-05-03" ...
$ Value: Factor w/ 7 levels "1","2","3","4",..: 2 3 6 7
If you need plyr for other tasks you ought to use a different
class for your date data (or wait until plyr can deal with
POSIXlt objects).
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
Maybe that has something to do with the odd "dates" since
they are not
really dates at all, just string representations of factor levels.
Compare with:
DF <- data.frame(ID1=c(1,2,2,3,3,4,4),
ID2=c('a','b','b','c','d','e','e'),
ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=as.Date(c("1985-05-1","1985-05-2","1985-05-3",
"1985-05-4","1985-05-5","1985-05-6","1985-05-7")),
Value=c(1,2,3,4,5,6,7))
str(DF)
#'data.frame': 7 obs. of 5 variables:
# $ ID1 : num 1 2 2 3 3 4 4
# $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5
# $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
# $ Date : Date, format: "1985-05-01" "1985-05-02" ...
# $ Value: num 1 2 3 4 5 6 7
This version also works for me.
ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
# ID1 ID2 ID3 Date Value
#1 2 b v1 1985-05-02 2
#2 2 b v1 1985-05-03 3
#3 4 e v1 1985-05-06 6
#4 4 e v1 1985-05-07 7
Thanks in advance, Christoph
-------------------------------------------------------------- -------------------------------------------------------------- ----------------------------------------
Christoph J?ckel (Dipl.-Kfm.)
-------------------------------------------------------------- -------------------------------------------------------------- ----------------------------------------
Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls f?r Finanzmanagement und Kapitalm?rkte TUM School of Management | Technische Universit?t M?nchen Arcisstr. 21 | D-80333 M?nchen | Germany
-- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 4/25/2011 11:55 AM, William Dunlap wrote:
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Brian Diggs Sent: Monday, April 25, 2011 11:05 AM To: christoph.jaeckel at wi.tum.de Cc: r-help at r-project.org Subject: Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column On 4/25/2011 10:19 AM, Christoph J?ckel wrote:
Hi Together, I have a problem with the plyr package - more precisely
with the ddply
function - and would be very grateful for any help. I hope
the example
here is precise enough for someone to identify the problem.
Basically,
in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference)
is handled
correctly. Any idea what I do wrong? df<-
data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d
','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-0
5-5","1985-05-6","1985-05-7"),
Value=c(1,2,3,4,5,6,7))) df[,1]<- as.character(df[,1]) df[,2]<- as.character(df[,2]) df$Date<- strptime(df$Date,"%Y-%m-%d") #Apparently there are two observation that have the same
IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the
desired output is:
df[c(2:3,6:7),] #My idea: Write a custom function that only returns
observations with
multiple rows. #Seems to work except that the Date column doesn't make any
sense anymore
#Warning message: In output[[var]][rng]<- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df)
if(nrow(df)<=1){NULL}else{df})
Works for me:
> df[c(2:3,6:7),]
ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-2 2 3 2 b v1 1985-05-3 3 6 4 e v1 1985-05-6 6 7 4 e v1 1985-05-7 7
> ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
ID1 ID2 ID3 Date Value 1 2 b v1 1985-05-2 2 2 2 b v1 1985-05-3 3 3 4 e v1 1985-05-6 6 4 4 e v1 1985-05-7 7 [ ... version info elided ... ] A couple of things: there was just an update of plyr to 1.5.2; maybe that fixes what you are seeing? Also, your df consists of only factors. cbind-ing the data before turning it into a data.frame makes it a character matrix which gets converted to factors.
> str(df)
'data.frame': 7 obs. of 5 variables: $ ID1 : Factor w/ 4 levels "1","2","3","4": 1 2 2 3 3 4 4 $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5 $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1 $ Date : Factor w/ 7 levels "1985-05-1","1985-05-2",..: 1 2 3 4 5 6 7 $ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7
The OP's data.frame contained a POSIXlt (not factor) object in the "Date" column
> str(df)
'data.frame': 7 obs. of 5 variables:
$ ID1 : chr "1" "2" "2" "3" ...
$ ID2 : chr "a" "b" "b" "c" ...
$ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
$ Date : POSIXlt, format: "1985-05-01" "1985-05-02" ...
$ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7
Thanks, Bill. Somehow I missed that, despite the OP having it in his code; I even copied it into my testing window. It was my error for not running it and noting it.
and apparently plyr's equivalent of rbind doesn't support that class.
plyr uses rbind.fill primarily. And it doesn't handle columns of POSIXlt based on testing that directly. (Although with only one argument, it just passes the data.frame back, which is why when there was just a single duplicate, it worked; that bypassed the code that couldn't handle POSIXlt's.)
If you want to continue using POSIXlt objects you can get your immediate result without ddply; subscripting will do the job:
> nDups<- with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length)) > print(nDups)
[1] 1 2 2 1 1 2 2
> df[nDups>1, ]
ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-02 2 3 2 b v1 1985-05-03 3 6 4 e v1 1985-05-06 6 7 4 e v1 1985-05-07 7
> str(.Last.value)
'data.frame': 4 obs. of 5 variables:
$ ID1 : chr "2" "2" "4" "4"
$ ID2 : chr "b" "b" "e" "e"
$ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1
$ Date : POSIXlt, format: "1985-05-02" "1985-05-03" ...
$ Value: Factor w/ 7 levels "1","2","3","4",..: 2 3 6 7
If you need plyr for other tasks you ought to use a different
class for your date data (or wait until plyr can deal with
POSIXlt objects).
If you do want to change classes, both Date and POSIXct are choices that will work with plyr.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
Maybe that has something to do with the odd "dates" since
they are not
really dates at all, just string representations of factor levels.
Compare with:
DF<- data.frame(ID1=c(1,2,2,3,3,4,4),
ID2=c('a','b','b','c','d','e','e'),
ID3=c("v1","v1","v1","v1","v2","v1","v1"),
Date=as.Date(c("1985-05-1","1985-05-2","1985-05-3",
"1985-05-4","1985-05-5","1985-05-6","1985-05-7")),
Value=c(1,2,3,4,5,6,7))
str(DF)
#'data.frame': 7 obs. of 5 variables:
# $ ID1 : num 1 2 2 3 3 4 4
# $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5
# $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1
# $ Date : Date, format: "1985-05-01" "1985-05-02" ...
# $ Value: num 1 2 3 4 5 6 7
This version also works for me.
ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
# ID1 ID2 ID3 Date Value
#1 2 b v1 1985-05-02 2
#2 2 b v1 1985-05-03 3
#3 4 e v1 1985-05-06 6
#4 4 e v1 1985-05-07 7
Thanks in advance, Christoph
-------------------------------------------------------------- -------------------------------------------------------------- ----------------------------------------
Christoph J?ckel (Dipl.-Kfm.)
-------------------------------------------------------------- -------------------------------------------------------------- ----------------------------------------
Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls f?r Finanzmanagement und Kapitalm?rkte TUM School of Management | Technische Universit?t M?nchen Arcisstr. 21 | D-80333 M?nchen | Germany
-- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health& Science University
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects).
How do you get POSIXlt objects into a data frame?
df <- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
str(df)
'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: "2008-01-01"
df <- data.frame(x = I(as.POSIXlt(as.Date(c("2008-01-01")))))
str(df)
'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: "0" Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Hi together, thank you so much for your help! The problem was indeed the strptime-function. Replacing that with as.Date solves the problem, both in the example I provided and in my actual data set. I think this is a lesson for me to not use types I'm not really familiar with (POSIXlt in this case). Thanks again! Christoph
On Mon, Apr 25, 2011 at 10:07 PM, Hadley Wickham <hadley at rice.edu> wrote:
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects).
How do you get POSIXlt objects into a data frame?
df <- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
str(df)
'data.frame': ? 1 obs. of ?1 variable: ?$ x: POSIXct, format: "2008-01-01"
df <- data.frame(x = I(as.POSIXlt(as.Date(c("2008-01-01")))))
str(df)
'data.frame': ? 1 obs. of ?1 variable: ?$ x: AsIs, format: "0" Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
--
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Christoph J?ckel (Dipl.-Kfm.)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Research Assistant
Chair for Financial Management and Capital Markets | Lehrstuhl f?r
Finanzmanagement und Kapitalm?rkte
TUM School of Management | Technische Universit?t M?nchen
Arcisstr. 21 | D-80333 M?nchen | Germany
Mailto:?christoph.jaeckel at wi.tum.de?| Web:?www.fm.wi.tum.de
Phone: +49 89 289 25482 | Fax: +49 89 289 25488
Head of Chair:
Univ.-Prof. Dr. Christoph Kaserer
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
E-Mail Disclaimer
Der Inhalt dieser E-Mail ist vertraulich und ausschliesslich
fuer den bezeichneten Adressaten bestimmt. Wenn Sie nicht
der vorgesehene Adressat dieser E-Mail oder dessen Vertreter
sein sollten, so beachten Sie bitte, dass jede Form der
Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir
bitten Sie, sich in diesem Fall mit dem Absender der E-Mail
in Verbindung zu setzen.
The information contained in this email is confidential....{{dropped:11}}
On 4/25/2011 1:07 PM, Hadley Wickham wrote:
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects).
How do you get POSIXlt objects into a data frame?
df<- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
str(df)
'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: "2008-01-01"
df<- data.frame(x = I(as.POSIXlt(as.Date(c("2008-01-01")))))
str(df)
'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: "0" Hadley
Assigning to a column after the data.frame creation step
> df <- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
> str(df)
'data.frame': 1 obs. of 1 variable:
$ x: POSIXct, format: "2008-01-01"
> dput(df)
structure(list(x = structure(1199145600, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = "x", row.names = c(NA, -1L
), class = "data.frame")
> df$x <- as.POSIXlt(as.Date(c("2008-01-01")))
> str(df)
'data.frame': 1 obs. of 1 variable:
$ x: POSIXlt, format: "2008-01-01"
> dput(df)
structure(list(x = structure(list(sec = 0, min = 0L, hour = 0L,
mday = 1L, mon = 0L, year = 108L, wday = 2L, yday = 0L, isdst =
0L), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"), tzone = "UTC")), .Names = "x",
row.names = c(NA,
-1L), class = "data.frame")
This is reminiscent of the 1d array problem; there are types that are
coerced into other types when passed as part of a data.frame constructor
(data.frame call), but are not coerced when assigned to a column.
Looking at help pages, calls to data.frame call as.data.frame on each
argument; `[<-.data.frame` has a section on coercion which starts "The
story over when replacement values are coerced is a complicated one, and
one that has changed during R's development. This section is a guide
only." which makes me think it is not all that well defined.
Digging more, there is a as.data.frame.POSIXlt, although the help page
for it (DateTimeClasses in base) does not mention it or document it. It
is documented, though, in as.data.frame (which also has comments about
coercing 1 dimensional arrays).
So, potentially, there could be differences with any class that has an
as.data.frame method because it will be treated differently if passed to
data.frame versus a column assignment with `[<-.data.frame`
> methods("as.data.frame")
[1] as.data.frame.aovproj* as.data.frame.array
[3] as.data.frame.AsIs as.data.frame.character
[5] as.data.frame.complex as.data.frame.data.frame
[7] as.data.frame.Date as.data.frame.default
[9] as.data.frame.difftime as.data.frame.factor
[11] as.data.frame.ftable* as.data.frame.function
[13] as.data.frame.idf* as.data.frame.integer
[15] as.data.frame.list as.data.frame.logical
[17] as.data.frame.logLik* as.data.frame.matrix
[19] as.data.frame.model.matrix as.data.frame.numeric
[21] as.data.frame.numeric_version as.data.frame.ordered
[23] as.data.frame.POSIXct as.data.frame.POSIXlt
[25] as.data.frame.raw as.data.frame.table
[27] as.data.frame.ts as.data.frame.vector
So, I suppose it is working as documented. Though I wonder how long ago
it was that someone (who has been using R regularly for at least a year)
actually read the entire help page for data.frame and/or as.data.frame.
It's one of those things you think you know and understand until you
find out you don't.
Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University
On 2011-04-25 13:07, Hadley Wickham wrote:
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects).
How do you get POSIXlt objects into a data frame?
df<- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
str(df)
'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: "2008-01-01"
df<- data.frame(x = I(as.POSIXlt(as.Date(c("2008-01-01")))))
str(df)
'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: "0" Hadley
To mimic the OP's code df <- data.frame(x = "2008-01-01") df$x <- as.POSIXlt(df$x, "%Y-%m-%d") str(df) #'data.frame': 1 obs. of 1 variable: # $ x: POSIXlt, format: "2008-01-01" Peter Ehlers