From: Berton Gunter <gunter.berton at gene.com>
To: "'Sean Davis'" <sdavis2 at mail.nih.gov>, <sms13+ at pitt.edu>
CC: "'rhelp'" <r-help at stat.math.ethz.ch>
Subject: RE: [R] obtaining first and last record for rows with same
identifier
Date: Tue, 24 May 2005 12:17:58 -0700
I think by() is simpler:
by(yourframe,factor(yourframe$patid),function(x)x[c(1,nrow(x)),])
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
"The business of the statistician is to catalyze the scientific learning
process." - George E. P. Box
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis
Sent: Tuesday, May 24, 2005 11:38 AM
To: sms13+ at pitt.edu
Cc: rhelp
Subject: Re: [R] obtaining first and last record for rows
with same identifier
If you have your data.frame ordered by the patid, you can use the
function rle in combination with cumsum. As a vector example:
> a <- rep(c('a','b','c'),10)
> a
[1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a"
"b" "c" "a"
[20] "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c"
[1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b"
"b" "b" "b"
[20] "b" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
> l <- rle(b)$length
> cbind(l,cumsum(l),cumsum(l)-l+1)
l
[1,] 10 10 1
[2,] 10 20 11
[3,] 10 30 21
# use the line below to get the length of the block of the dataframe,
the start, and then end indices
> cbind(l,cumsum(l)-l+1,cumsum(l))
l
[1,] 10 1 10
[2,] 10 11 20
[3,] 10 21 30
Sean
On May 24, 2005, at 2:27 PM, sms13+ at pitt.edu wrote:
I have a dataframe that contains fields such as patid, labdate,
labvalue.
The same patid may show up in multiple rows because of lab
measurements on multiple days. Is there a simple way to
the first and last record for each patient, or do I need to
code that performs that.
Thanks,
Steven