An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080923/17479e14/attachment.pl>
Contional
3 messages · Michael Pearmain, Peter Alspach, jim holtman
Michael test[!duplicated(paste(test$timestamp, test$user_id)),] should remove the second (and subsequent) occurrences of duplicates. Your example suggests you don't always want to keep the first occurrence, but the rule which determines which occurrence you want to keep is not obvious to me. HTH .... Peter Alspach
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Pearmain Sent: Wednesday, 24 September 2008 8:44 a.m. To: r-help at r-project.org Subject: [R] Contional Hi All, I'm having trouble selecting rows to delete, that i can't seem to overcome. Below is some sample data, i am trying to dedup the data based on each user, and simultaneously the timestamp (at the side i have highlighted expected row to be removed) I've looked at the lag function but can't seem to make it work? My logic ran along the lines of an ifelse statement and then remove after that, but it doesn't seem to work? Any help appreciated Let's call the data test test$lag <- ifelse(test$user_id==lag(test$user_id) & test$timestamp==lag(test$timestamp),1,0) Can anyone help on this? Mike Source_type timestamp user_id 75381 0 07-07-2008-21:03:55 848307909687 75379 1 07-07-2008-19:52:55 848307838407 75380 2 07-07-2008-19:54:14 848307838407 75378 1 07-07-2008-15:24:01 848285633277 75374 1 07-07-2008-13:39:17 848273633667 75377 2 07-07-2008-13:39:55 848273633667 75376 2 07-07-2008-13:39:55 848273633667 Remove 75375 2 07-07-2008-13:56:05 848273633667 75373 1 07-07-2008-17:11:00 848272661427 75371 1 07-07-2008-13:19:00 848270431847 75372 2 07-07-2008-13:19:14 848270431847 75369 1 07-07-2008-12:49:16 848269676907 Remove 75370 2 07-07-2008-12:49:16 848269676907 75366 1 07-07-2008-13:29:15 848263484847 75368 2 07-07-2008-13:29:44 848263484847 Thanks in advance [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail.
Is this what you want: TRUE marks the ones to be removed
mark <- (head(x$timestamp, -1) == tail(x$timestamp, -1)) &
+ (head(x$user_id, -1) == tail(x$user_id, -1))
x$flag <- c(FALSE, mark) x
Source_type timestamp user_id flag 75381 0 07-07-2008-21:03:55 848307909687 FALSE 75379 1 07-07-2008-19:52:55 848307838407 FALSE 75380 2 07-07-2008-19:54:14 848307838407 FALSE 75378 1 07-07-2008-15:24:01 848285633277 FALSE 75374 1 07-07-2008-13:39:17 848273633667 FALSE 75377 2 07-07-2008-13:39:55 848273633667 FALSE 75376 2 07-07-2008-13:39:55 848273633667 TRUE 75375 2 07-07-2008-13:56:05 848273633667 FALSE 75373 1 07-07-2008-17:11:00 848272661427 FALSE 75371 1 07-07-2008-13:19:00 848270431847 FALSE 75372 2 07-07-2008-13:19:14 848270431847 FALSE 75369 1 07-07-2008-12:49:16 848269676907 FALSE 75370 2 07-07-2008-12:49:16 848269676907 TRUE 75366 1 07-07-2008-13:29:15 848263484847 FALSE 75368 2 07-07-2008-13:29:44 848263484847 FALSE
On Tue, Sep 23, 2008 at 4:44 PM, Michael Pearmain <mpearmain at google.com> wrote:
Hi All,
I'm having trouble selecting rows to delete, that i can't seem to overcome.
Below is some sample data, i am trying to dedup the data based on each user,
and simultaneously the timestamp (at the side i have highlighted expected
row to be removed)
I've looked at the lag function but can't seem to make it work?
My logic ran along the lines of an ifelse statement and then remove after
that, but it doesn't seem to work? Any help appreciated
Let's call the data test
test$lag <- ifelse(test$user_id==lag(test$user_id)
& test$timestamp==lag(test$timestamp),1,0)
Can anyone help on this?
Mike
Source_type timestamp user_id
75381 0 07-07-2008-21:03:55 848307909687
75379 1 07-07-2008-19:52:55 848307838407
75380 2 07-07-2008-19:54:14 848307838407
75378 1 07-07-2008-15:24:01 848285633277
75374 1 07-07-2008-13:39:17 848273633667
75377 2 07-07-2008-13:39:55 848273633667
75376 2 07-07-2008-13:39:55 848273633667 Remove
75375 2 07-07-2008-13:56:05 848273633667
75373 1 07-07-2008-17:11:00 848272661427
75371 1 07-07-2008-13:19:00 848270431847
75372 2 07-07-2008-13:19:14 848270431847
75369 1 07-07-2008-12:49:16 848269676907 Remove
75370 2 07-07-2008-12:49:16 848269676907
75366 1 07-07-2008-13:29:15 848263484847
75368 2 07-07-2008-13:29:44 848263484847
Thanks in advance
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?