which rows are duplicates?
At 05:07 30/03/2009, Aaron M. Swoboda wrote:
I would like to know which rows are duplicates of each other, not simply that a row is duplicate of another row. In the following example rows 1 and 3 are duplicates.
x <- c(1,3,1) y <- c(2,4,2) z <- c(3,4,3) data <- data.frame(x,y,z)
x y z 1 1 2 3 2 3 4 4 3 1 2 3
Does this do what you want? > x <- c(1,3,1) > y <- c(2,4,2) > z <- c(3,4,3) > data <- data.frame(x,y,z) > data.u <- unique(data) > data.u x y z 1 1 2 3 2 3 4 4 > data.u <- cbind(data.u, set = 1:nrow(data.u)) > merge(data, data.u) x y z set 1 1 2 3 1 2 1 2 3 1 3 3 4 4 2 You need to do a bit more work to get them back into the original row order if that is essential.
I can't figure out how to get R to tell me that observation 1 and 3 are the same. It seems like the "duplicated" and "unique" functions should be able to help me out, but I am stumped. For instance, if I use "duplicated" ...
duplicated(data)
[1] FALSE FALSE TRUE it tells me that row 3 is a duplicate, but not which row it matches. How do I figure out WHICH row it matches? And If I use "unique"...
unique(data)
x y z 1 1 2 3 2 3 4 4 I see that rows 1 and 2 are unique, leaving me to infer that row 3 was a duplicate, but again it doesn't tell me which row it was a duplicate of (as far as I can tell). Am I missing something? How can I determine that row 3 is a duplicate OF ROW 1? Thanks, Aaron
Michael Dewey http://www.aghmed.fsnet.co.uk