Hi, I am trying to simultaneously remove duplicate variables from two
or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS.
Here's my example data :
test <- read.csv("test.csv", sep=",", as.is=TRUE)
date var1 var2 num1 num2
1 28/01/11 a 1 213 71
2 28/01/11 b 1 141 47
3 28/01/11 c 2 867 289
4 29/01/11 a 2 234 78
5 29/01/11 b 2 666 222
6 29/01/11 c 2 912 304
7 30/01/11 a 3 417 139
8 30/01/11 b 3 108 36
9 30/01/11 c 2 288 96
I am trying to obtain the following, where duplicates of date AND var2
are removed from the above data.frame.
date var1 var2 num1 num2
28/01/2011 a 1 213 71
28/01/2011 c 2 867 289
29/01/2011 a 2 234 78
30/01/2011 c 2 288 96
30/01/2011 a 3 417 139
If I use the !duplicated function with one variable everything works
fine.
However I wish to remove duplicates of both Date and var2.
test[!duplicated(test$date),]
date var1 var2 num1 num2
1 0011-01-28 a 1 213 71
4 0011-01-29 a 2 234 78
7 0011-01-30 a 3 417 139
test2 <- test[!duplicated(test$date),!duplicated(test$var2),]
Error in `[.data.frame`(test, !duplicated(test$date),
!duplicated(test$var2), : undefined columns selected
I got different errors when using the unique() function.
Can anybody solve this ?
Thanks in advance.
Jon