duplicates() function
On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
I need a function which is similar to duplicated(), but instead of returning TRUE/FALSE, returns indices of which element was duplicated. ?That is,
x <- c(9,7,9,3,7) duplicated(x)
[1] FALSE FALSE ?TRUE FALSE TRUE
duplicates(x)
[1] NA NA ?1 NA ?2 (so that I know that element 3 is a duplicate of element 1, and element 5 is a duplicate of element 2, whereas the others were not duplicated according to our definition.) Is there a simple way to write this function? ?I have ?an ugly implementation in R that loops over all the values; it would make more sense to redo it in C, if there isn't a simple implementation I missed.
I'd think of making it a lookup table. The basic idea is split(seq_along(x), x) but there are probably much faster ways of doing it, depending on what you need. But for efficiency, you probably need a hashtable somewhere. Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/