Skip to content
Back to formatted view

Raw Message

Message-ID: <971536df1001161506x25782cd8w998cd2bed2d5e60c@mail.gmail.com>
Date: 2010-01-16T23:06:09Z
From: Gabor Grothendieck
Subject: Extracing only Unique Rows based on only 1 Column
In-Reply-To: <20100116140435.88192zuxrh079r0g@webmail.cecs.pdx.edu>

Try this where DF is your data frame:

subset(DF, !duplicated(ID))

or equivalently:

DF[!duplicated(DF$ID), ]


On Sat, Jan 16, 2010 at 5:04 PM, Bryan M Hangartner
<hangartb at cecs.pdx.edu> wrote:
> To Whomever is Interested,
>
> I have spent several days searching the web, help files, the R wiki and the
> archives of this mailing list for a solution to this problem, but
> nonetheless I apologize in advance if I have missed something obvious.
>
> The problem is this; I have a 5-column data frame with about 4.2 million
> rows, and want to create a new (and hopefully much smaller) data frame that
> contains only the rows which have a unique value in the first column only.
> In other words, I do not care about the uniqueness of the values in the
> other four rows, only the uniqueness of the entries in the first row. The
> "unique" command does not seem to have this option available, at least based
> on what I've read in the help file.
>
> A simplified example matrix (designated as "traveltimes"):
>
> ID Time1 Time2
> 1 ? ?3 ? ? 4
> 1 ? ?4 ? ? 7
> 2 ? ?3 ? ? 5
> 2 ? ?5 ? ? 6
> 3 ? ?4 ? ? 5
> 3 ? ?2 ? ? 8
>
> When I use a command such as
>
> matches <- unique(traveltimes, incomparables = FALSE, fromLast = FALSE)
>
> I will end up with a 6-row matrix, exactly what I already have. What I would
> like to do is to remove the duplicate values in the column labeled "ID" and
> their associated Time1 and Time2 entries. This will give me a 3x3 matrix
> which contains only one instance of each "ID" variable. For the purposes of
> this particular problem, the uniqueness of the Time1 and Time2 rows is not
> relevant.
>
> If this question is not clear enough please let me know. Thank you for your
> time.
>
>
> --
> Bryan Hangartner
> hangartb at cecs.pdx.edu
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>