I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
Simple question on finding duplicates
7 messages · Jeff, David L Carlson, arun +2 more
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Jeff Sent: Wednesday, July 25, 2012 3:06 PM To: r-help at r-project.org Subject: [R] Simple question on finding duplicates I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Minor correction: duplicate <- ifelse(c(0, a$col[-length(a$col)])==a$col, 1, 0) ------- David
-----Original Message----- From: David L Carlson [mailto:dcarlson at tamu.edu] Sent: Wednesday, July 25, 2012 3:23 PM To: 'Jeff'; 'r-help at r-project.org' Subject: RE: [R] Simple question on finding duplicates duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Jeff Sent: Wednesday, July 25, 2012 3:06 PM To: r-help at r-project.org Subject: [R] Simple question on finding duplicates I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark
each
value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
HI, Try this: ? a <- data.frame( col1 = c(1,2,3,3,4)) a<-within(a, duplicate<-c(0,ifelse(diff(a$col1)==0,1,0))) ?a ? col1 duplicate 1??? 1???????? 0 2??? 2???????? 0 3??? 3???????? 0 4??? 3???????? 1 5??? 4???????? 0 A.K. ----- Original Message ----- From: Jeff <r at jp.pair.com> To: r-help at r-project.org Cc: Sent: Wednesday, July 25, 2012 4:05 PM Subject: [R] Simple question on finding duplicates ? I'm? trying? to find duplicate values in a column of a data frame. For ? example, dataframe (a) below has two 3's. I would like to mark each value of ? each row as either not being a duplicate of the one before (0), or as a ? duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply ? compare each value to it's "lagged" value, but I can't figure out how to do ? this with R. ? Can someone point me in the right direction? ? Thanks ? a <- data.frame( col1 = c(1,2,3,3,4)) ? b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
ummm... ?duplicates -- Bert
On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <dcarlson at tamu.edu> wrote:
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Jeff Sent: Wednesday, July 25, 2012 3:06 PM To: r-help at r-project.org Subject: [R] Simple question on finding duplicates I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Sorry... ?duplicated -- Bert
On Wed, Jul 25, 2012 at 1:28 PM, Bert Gunter <bgunter at gene.com> wrote:
ummm... ?duplicates -- Bert On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <dcarlson at tamu.edu> wrote:
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Jeff Sent: Wednesday, July 25, 2012 3:06 PM To: r-help at r-project.org Subject: [R] Simple question on finding duplicates I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
duplicate <- c(0, diff(a[,"col1"]) == 0) Peter Ehlers
On 2012-07-25 13:05, Jeff wrote:
I'm trying to find duplicate values in a column of a data frame. For
example, dataframe (a) below has two 3's. I would like to mark each value of
each row as either not being a duplicate of the one before (0), or as a
duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply
compare each value to it's "lagged" value, but I can't figure out how to do
this with R.
Can someone point me in the right direction?
Thanks
a <- data.frame( col1 = c(1,2,3,3,4))
b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.