To Whomever is Interested,
I have spent several days searching the web, help files, the R wiki and the
archives of this mailing list for a solution to this problem, but
nonetheless I apologize in advance if I have missed something obvious.
The problem is this; I have a 5-column data frame with about 4.2 million
rows, and want to create a new (and hopefully much smaller) data frame that
contains only the rows which have a unique value in the first column only.
In other words, I do not care about the uniqueness of the values in the
other four rows, only the uniqueness of the entries in the first row. The
"unique" command does not seem to have this option available, at least based
on what I've read in the help file.
A simplified example matrix (designated as "traveltimes"):
ID Time1 Time2
1 ? ?3 ? ? 4
1 ? ?4 ? ? 7
2 ? ?3 ? ? 5
2 ? ?5 ? ? 6
3 ? ?4 ? ? 5
3 ? ?2 ? ? 8
When I use a command such as
matches <- unique(traveltimes, incomparables = FALSE, fromLast = FALSE)
I will end up with a 6-row matrix, exactly what I already have. What I would
like to do is to remove the duplicate values in the column labeled "ID" and
their associated Time1 and Time2 entries. This will give me a 3x3 matrix
which contains only one instance of each "ID" variable. For the purposes of
this particular problem, the uniqueness of the Time1 and Time2 rows is not
relevant.
If this question is not clear enough please let me know. Thank you for your
time.
--
Bryan Hangartner
hangartb at cecs.pdx.edu