Remove duplicated rows

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100422/9198063d/attachment.pl>
Hi
r-help-bounces at r-project.org napsal dne 23.04.2010 04:05:00:
Hi all,

I have a dataset similar to the following

Name   Date   Value
A   1/01/2000   4
A   2/01/2000   4
A   3/01/2000   5
A   4/01/2000   4
A   5/01/2000   1
B   6/01/2000   2
B   7/01/2000   1
B   8/01/2000   1

I would like R to remove duplicates based on column 1 and 3 only. In
addition, I would like R to remove duplicates based on the underlying 
and
overlying row only. For example, for A, I would like to remove row 2 
only
and keep row 1, 3 and 4.
Hm. Strange. You want to keep lines 1,3 an 4. for A. What about line 5? 
Why do you want to keep line 1 and 4 which have A an 4 in both columns?

test=read.table("clipboard", header=T)
test[!duplicated(paste(test[,1], test[,3])),]
  Name      Date Value
1    A 1/01/2000     4
3    A 3/01/2000     5
5    A 5/01/2000     1
6    B 6/01/2000     2
7    B 7/01/2000     1

Gives you unique values, however I am not sure if it is what you want.

Regards
Petr
I have tried: unique() and replicated(), but I do not have much success. 
I
have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to
apply it to this multi-column situation.

Any help would be greatly appreciated.

Thanks in advance,
Chris
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Remove-duplicated-
rows-tp2023065p2023065.html
Sent from the R help mailing list archive at Nabble.com.

   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On Fri, Apr 23, 2010 at 4:05 AM, chrisli1223
Hi all,

I have a dataset similar to the following

Name ? ?Date ? ?Value
A ? ? ? 1/01/2000 ? ? ? 4
A ? ? ? 2/01/2000 ? ? ? 4
A ? ? ? 3/01/2000 ? ? ? 5
A ? ? ? 4/01/2000 ? ? ? 4
A ? ? ? 5/01/2000 ? ? ? 1
B ? ? ? 6/01/2000 ? ? ? 2
B ? ? ? 7/01/2000 ? ? ? 1
B ? ? ? 8/01/2000 ? ? ? 1

I would like R to remove duplicates based on column 1 and 3 only. In
addition, I would like R to remove duplicates based on the underlying and
overlying row only. For example, for A, I would like to remove row 2 only
and keep row 1, 3 and 4.

I have tried: unique() and replicated(), but I do not have much success. I
have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to
apply it to this multi-column situation.

Any help would be greatly appreciated.

Thanks in advance,
Chris
--
Hi,

This code is a bit ugly, but it works. Hope it helps.
/Gustaf

library(zoo)
test<-read.table("clipboard",header=T)
test$code<-paste(test$Name,test$Value,sep="")

drop.ndx<-rollapply(zoo(test$code),3,function(x)(x[2]%in%c(x[1],x[3])))

drop.ndx<-c(FALSE,drop.ndx,FALSE)
test[!drop.ndx,]
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
Try this:

DF[!duplicated(DF[-2]),]

On Thu, Apr 22, 2010 at 10:05 PM, chrisli1223
Hi all,

I have a dataset similar to the following

Name ? ?Date ? ?Value
A ? ? ? 1/01/2000 ? ? ? 4
A ? ? ? 2/01/2000 ? ? ? 4
A ? ? ? 3/01/2000 ? ? ? 5
A ? ? ? 4/01/2000 ? ? ? 4
A ? ? ? 5/01/2000 ? ? ? 1
B ? ? ? 6/01/2000 ? ? ? 2
B ? ? ? 7/01/2000 ? ? ? 1
B ? ? ? 8/01/2000 ? ? ? 1

I would like R to remove duplicates based on column 1 and 3 only. In
addition, I would like R to remove duplicates based on the underlying and
overlying row only. For example, for A, I would like to remove row 2 only
and keep row 1, 3 and 4.

I have tried: unique() and replicated(), but I do not have much success. I
have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to
apply it to this multi-column situation.

Any help would be greatly appreciated.

Thanks in advance,
Chris
--
View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2023065.html
Sent from the R help mailing list archive at Nabble.com.

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thank you Petr, Gustaf and Gabor. Your help is much appreciated.

I have tried:

dataset[!duplicated(dataset[,-2]),]

and it solves my problem.

Thanks,
Chris
View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2065997.html
Sent from the R help mailing list archive at Nabble.com.