Skip to content

Simple question on finding duplicates

7 messages · Jeff, David L Carlson, arun +2 more

#
I'm  trying  to find duplicate values in a column of a data frame. For
   example, dataframe (a) below has two 3's. I would like to mark each value of
   each row as either not being a duplicate of the one before (0), or as a
   duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply
   compare each value to it's "lagged" value, but I can't figure out how to do
   this with R.
   Can someone point me in the right direction?
   Thanks
   a <- data.frame( col1 = c(1,2,3,3,4))
   b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
#
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0)

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
Minor correction:

duplicate <- ifelse(c(0, a$col[-length(a$col)])==a$col, 1, 0)

-------
David
#
HI,
Try this:


? a <- data.frame( col1 = c(1,2,3,3,4))
a<-within(a, duplicate<-c(0,ifelse(diff(a$col1)==0,1,0)))
?a
? col1 duplicate
1??? 1???????? 0
2??? 2???????? 0
3??? 3???????? 0
4??? 3???????? 1
5??? 4???????? 0
A.K.



----- Original Message -----
From: Jeff <r at jp.pair.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, July 25, 2012 4:05 PM
Subject: [R] Simple question on finding duplicates


?  I'm? trying? to find duplicate values in a column of a data frame. For
?  example, dataframe (a) below has two 3's. I would like to mark each value of
?  each row as either not being a duplicate of the one before (0), or as a
?  duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply
?  compare each value to it's "lagged" value, but I can't figure out how to do
?  this with R.
?  Can someone point me in the right direction?
?  Thanks
?  a <- data.frame( col1 = c(1,2,3,3,4))
?  b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
ummm...
?duplicates

-- Bert
On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <dcarlson at tamu.edu> wrote:

  
    
#
Sorry...
?duplicated

-- Bert
On Wed, Jul 25, 2012 at 1:28 PM, Bert Gunter <bgunter at gene.com> wrote:

  
    
#
duplicate <- c(0, diff(a[,"col1"]) == 0)

Peter Ehlers
On 2012-07-25 13:05, Jeff wrote: