An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100422/9198063d/attachment.pl>
Remove duplicated rows
5 messages · chrisli1223, PIKAL Petr, Gustaf Rydevik +1 more
Hi r-help-bounces at r-project.org napsal dne 23.04.2010 04:05:00:
Hi all, I have a dataset similar to the following Name Date Value A 1/01/2000 4 A 2/01/2000 4 A 3/01/2000 5 A 4/01/2000 4 A 5/01/2000 1 B 6/01/2000 2 B 7/01/2000 1 B 8/01/2000 1 I would like R to remove duplicates based on column 1 and 3 only. In addition, I would like R to remove duplicates based on the underlying
and
overlying row only. For example, for A, I would like to remove row 2
only
and keep row 1, 3 and 4.
Hm. Strange. You want to keep lines 1,3 an 4. for A. What about line 5?
Why do you want to keep line 1 and 4 which have A an 4 in both columns?
test=read.table("clipboard", header=T)
test[!duplicated(paste(test[,1], test[,3])),]
Name Date Value
1 A 1/01/2000 4
3 A 3/01/2000 5
5 A 5/01/2000 1
6 B 6/01/2000 2
7 B 7/01/2000 1
Gives you unique values, however I am not sure if it is what you want.
Regards
Petr
I have tried: unique() and replicated(), but I do not have much success.
I
have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to apply it to this multi-column situation. Any help would be greatly appreciated. Thanks in advance, Chris -- View this message in context:
http://r.789695.n4.nabble.com/Remove-duplicated-
rows-tp2023065p2023065.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On Fri, Apr 23, 2010 at 4:05 AM, chrisli1223
<chrisli at austwaterenv.com.au> wrote:
Hi all, I have a dataset similar to the following Name ? ?Date ? ?Value A ? ? ? 1/01/2000 ? ? ? 4 A ? ? ? 2/01/2000 ? ? ? 4 A ? ? ? 3/01/2000 ? ? ? 5 A ? ? ? 4/01/2000 ? ? ? 4 A ? ? ? 5/01/2000 ? ? ? 1 B ? ? ? 6/01/2000 ? ? ? 2 B ? ? ? 7/01/2000 ? ? ? 1 B ? ? ? 8/01/2000 ? ? ? 1 I would like R to remove duplicates based on column 1 and 3 only. In addition, I would like R to remove duplicates based on the underlying and overlying row only. For example, for A, I would like to remove row 2 only and keep row 1, 3 and 4. I have tried: unique() and replicated(), but I do not have much success. I have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to apply it to this multi-column situation. Any help would be greatly appreciated. Thanks in advance, Chris --
Hi,
This code is a bit ugly, but it works. Hope it helps.
/Gustaf
library(zoo)
test<-read.table("clipboard",header=T)
test$code<-paste(test$Name,test$Value,sep="")
drop.ndx<-rollapply(zoo(test$code),3,function(x)(x[2]%in%c(x[1],x[3])))
drop.ndx<-c(FALSE,drop.ndx,FALSE)
test[!drop.ndx,]
Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik
Try this: DF[!duplicated(DF[-2]),] On Thu, Apr 22, 2010 at 10:05 PM, chrisli1223
<chrisli at austwaterenv.com.au> wrote:
Hi all, I have a dataset similar to the following Name ? ?Date ? ?Value A ? ? ? 1/01/2000 ? ? ? 4 A ? ? ? 2/01/2000 ? ? ? 4 A ? ? ? 3/01/2000 ? ? ? 5 A ? ? ? 4/01/2000 ? ? ? 4 A ? ? ? 5/01/2000 ? ? ? 1 B ? ? ? 6/01/2000 ? ? ? 2 B ? ? ? 7/01/2000 ? ? ? 1 B ? ? ? 8/01/2000 ? ? ? 1 I would like R to remove duplicates based on column 1 and 3 only. In addition, I would like R to remove duplicates based on the underlying and overlying row only. For example, for A, I would like to remove row 2 only and keep row 1, 3 and 4. I have tried: unique() and replicated(), but I do not have much success. I have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to apply it to this multi-column situation. Any help would be greatly appreciated. Thanks in advance, Chris -- View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2023065.html Sent from the R help mailing list archive at Nabble.com. ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
3 days later
Thank you Petr, Gustaf and Gabor. Your help is much appreciated. I have tried: dataset[!duplicated(dataset[,-2]),] and it solves my problem. Thanks, Chris
View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2065997.html Sent from the R help mailing list archive at Nabble.com.