Message-ID: <CAAxdm-5TUXz5XXd5q=U7VV3aTvPU3qqwxkZx0tMnc2h8McnVqg@mail.gmail.com>
Date: 2011-10-02T22:26:36Z
From: jim holtman
Subject: Keep ALL duplicate records
In-Reply-To: <1317579523767-3865573.post@n4.nabble.com>
Here is a function I use to find all duplicate records
> allDup <- function (value)
{
duplicated(value) | duplicated(value, fromLast = TRUE)
}
> x
ID OS time
1 userA Win 12:22
2 userB OSX 23:22
3 userA Win 04:44
4 userC Win64 12:28
> x[allDup(x$ID),]
ID OS time
1 userA Win 12:22
3 userA Win 04:44
>
On Sun, Oct 2, 2011 at 2:18 PM, Pete Brecknock <Peter.Brecknock at bp.com> wrote:
>
> Erik Svensson wrote:
>>
>> Hello,
>> In a data frame I want to identify ALL duplicate IDs in the example to be
>> able to examine "OS" and "time".
>>
>> (df<-data.frame(ID=c("userA", "userB", "userA", "userC"),
>> ? OS=c("Win","OSX","Win", "Win64"),
>> ? time=c("12:22","23:22","04:44","12:28")))
>>
>> ? ? ?ID ? ?OS ?time
>> 1 userA ? Win 12:22
>> 2 userB ? OSX 23:22
>> 3 userA ? Win 04:44
>> 4 userC Win64 12:28
>>
>> My desired output is that ALL records with the same IDs are found:
>>
>> userA ? Win 12:22
>> userA ? Win 04:44
>>
>> preferably by returning logical values (TRUE FALSE TRUE FALSE)
>>
>> Is there a simple way to do that?
>>
>> [-- With duplicated(df$ID) the output will be
>> [1] FALSE FALSE ?TRUE FALSE
>> i.e. not all user A records are found
>>
>> With unique(df$ID)
>> [1] userA userB userC
>> Levels: userA userB userC
>> i.e. one of each ID is found --]
>>
>> Erik Svensson
>>
>
>
> How about ...
>
> # All records
> ALL_RECORDS <- df[df$ID==df$ID[duplicated(df$ID)],]
> print(ALL_RECORDS)
>
> # Logical Records
> TRUE_FALSE <- df$ID==df$ID[duplicated(df$ID)]
> print(TRUE_FALSE)
>
> HTH
>
> Pete
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?