Hello R help,
I have a question similar to what is posted by someone before. my
problem is that Instead of last assessment, I want to choose last two.
I have a data set with several time assessments for each participant.
I want to select the last assessment for each participant. My dataset
looks like this:
ID week outcome
1 2 14
1 4 28
1 6 42
4 2 14
4 6 46
4 9 64
4 9 71
4 12 85
9 2 14
9 4 28
9 6 51
9 9 66
9 12 84
Here is one solution for choosing last assessment
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
ID week outcome
1 1 6 42
4 4 12 85
9 9 12 84
Selecting n observation
4 messages · bibek sharma, Peter Ehlers, David Winsemius +1 more
On 2012-10-11 12:48, bibek sharma wrote:
Hello R help,
I have a question similar to what is posted by someone before. my
problem is that Instead of last assessment, I want to choose last two.
I have a data set with several time assessments for each participant.
I want to select the last assessment for each participant. My dataset
looks like this:
ID week outcome
1 2 14
1 4 28
1 6 42
4 2 14
4 6 46
4 9 64
4 9 71
4 12 85
9 2 14
9 4 28
9 6 51
9 9 66
9 12 84
Here is one solution for choosing last assessment
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
ID week outcome
1 1 6 42
4 4 12 85
9 9 12 84
With the plyr package: library(plyr) ddply(df, .(ID), function(x) tail(x, 2)) or, slightly simpler: ddply(df, .(ID), tail, 2) Peter Ehlers
On Oct 11, 2012, at 12:48 PM, bibek sharma wrote:
Hello R help,
I have a question similar to what is posted by someone before. my
problem is that Instead of last assessment, I want to choose last two.
I have a data set with several time assessments for each participant.
I want to select the last assessment for each participant. My dataset
looks like this:
ID week outcome
1 2 14
1 4 28
1 6 42
4 2 14
4 6 46
4 9 64
4 9 71
4 12 85
9 2 14
9 4 28
9 6 51
9 9 66
9 12 84
Here is one solution for choosing last assessment
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
Why wouldn't the solution be something along the lines of:
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) ))
ID week outcome 1 1 6 42 4 4 12 85 9 9 12 84
David Winsemius, MD Alameda, CA, USA
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) ))
Another way to approach this sort of problem is to use ave() to
assign a within-group sequence number to each row and then
select the rows with the sequence numbers you want. You can
also use ave() to make a column giving the size of the group that
each item is in. Hence you can select things like "the last 2 items
in each category that had at least 3 items".
E.g., here is a function to generate data on visits of patients to
a clinic, where the visits are listed in time order.
makeData <- function(nVisits, Doctors=paste("Dr.",LETTERS[1:2]), Patients=101:104, seed = 1)
{
if (!is.null(seed)) set.seed(seed)
data.frame(Doctor=sample(Doctors, replace=TRUE, nVisits),
Patient=sample(Patients, replace=TRUE, nVisits),
Date=as.Date("2004-01-01")+sort(sample(2000, replace=TRUE, nVisits)))
}
# Make a 12-row dataset
d <- makeData(12)
# Add columns describing the visits between each doctor/patient pair
d1 <- within(d, { N=ave(integer(length(Date)), Doctor, Patient, FUN=length)
Seq=ave(integer(length(Date)), Doctor, Patient, FUN=seq_along)})
d1
# Doctor Patient Date Seq N
# 1 Dr. A 103 2004-01-28 1 3
# 2 Dr. A 102 2005-01-08 1 1
# 3 Dr. B 104 2005-06-19 1 4
# 4 Dr. B 102 2005-11-12 1 2
# 5 Dr. A 103 2006-02-04 2 3
# 6 Dr. B 104 2006-02-12 2 4
# 7 Dr. B 102 2006-08-23 2 2
# 8 Dr. B 104 2006-09-15 3 4
# 9 Dr. B 104 2007-04-15 4 4
# 10 Dr. A 101 2007-08-30 1 2
# 11 Dr. A 103 2008-07-13 3 3
# 12 Dr. A 101 2008-10-06 2 2
# Show the last visit in each doctor/patient group
d[d1$Seq==d1$N, ]
# Doctor Patient Date
# 2 Dr. A 102 2005-01-08
# 7 Dr. B 102 2006-08-23
# 9 Dr. B 104 2007-04-15
# 11 Dr. A 103 2008-07-13
# 12 Dr. A 101 2008-10-06
# Show last 2 visits, but only if there were at least 2 visits
d[d1$Seq>d1$N-2 & d1$N>=2, ]
# Doctor Patient Date
# 4 Dr. B 102 2005-11-12
# 5 Dr. A 103 2006-02-04
# 7 Dr. B 102 2006-08-23
# 8 Dr. B 104 2006-09-15
# 9 Dr. B 104 2007-04-15
# 10 Dr. A 101 2007-08-30
# 11 Dr. A 103 2008-07-13
# 12 Dr. A 101 2008-10-06
# Show the amount of time beteen the last two visits in a group (if there were at least 2 visits)
d[d1$Seq==d1$N & d1$N>=2, "Date"] - d[d1$Seq==d1$N-1 & d1$N>=2, "Date"]
# Time differences in days
# [1] 284 435 667 403
I find it easier to formulate the queries with this method. For large
datasets, selecting rows according a criterion can be a lot
faster than splitting a data.frame into many parts, processing
them with tail, and combining them again.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Thursday, October 11, 2012 2:13 PM To: bibek sharma Cc: r-help at r-project.org Subject: Re: [R] Selecting n observation On Oct 11, 2012, at 12:48 PM, bibek sharma wrote:
Hello R help,
I have a question similar to what is posted by someone before. my
problem is that Instead of last assessment, I want to choose last two.
I have a data set with several time assessments for each participant.
I want to select the last assessment for each participant. My dataset
looks like this:
ID week outcome
1 2 14
1 4 28
1 6 42
4 2 14
4 6 46
4 9 64
4 9 71
4 12 85
9 2 14
9 4 28
9 6 51
9 9 66
9 12 84
Here is one solution for choosing last assessment
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
Why wouldn't the solution be something along the lines of:
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) ))
ID week outcome 1 1 6 42 4 4 12 85 9 9 12 84
David Winsemius, MD Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.