the first and last observation for each subject

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090102/2180b5b6/attachment-0001.pl>
Hi

r-help-bounces at r-project.org napsal dne 02.01.2009 10:20:23:
I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over 
time.
I want to find out the difference between the first and last observed y
value for each subject and get a table like
sapply(split(test$y, test$ID), function(x) tail(x, 1)-head(x,1))

I am leaving formating to the resulting table to you. Hint: aggregate

Best regards
Petr
ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?

   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

First, order your data by ID and time.

The columns you want in your output dataframe are then

unique(ID),

tapply( x, ID, function( z ) z[ 1 ] )

and

tapply( y, ID, function( z ) z[ lenght( z ) ] - z[ 1 ] )

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com
I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090102/a0e637aa/attachment-0001.pl>
Try this:
Lines <- "ID x y time
+ 1  10 20 0
+ 1  10 30 1
+ 1 10 40 2
+ 2 12 23 0
+ 2 12 25 1
+ 2 12 28 2
+ 2 12 38 3
+ 3 5 10 0
+ 3 5 15 2"
DF <- read.table(textConnection(Lines), header = TRUE)
aggregate(DF[3], DF[1:2], function(x) tail(x, 1) - head(x, 1))
ID  x  y
1  3  5  5
2  1 10 20
3  2 12 15
I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?
One approach is to use the plyr package, as documented at
http://had.co.nz/plyr.  The basic idea is that your problem is easy to
solve if you have a subset for a single subject value:

one <- subset(DF, ID == 1)
with(one, y[length(y)] - y[1])

The difficulty is splitting up the original dataset in to subjects,
applying the solution to each piece and then joining all the results
back together.  This is what the plyr package does for you:

library(plyr)

# ddply is for splitting up data frames and combining the results
# into a data frame.  .(ID) says to split up the data frame by the subject
# variable
ddply(DF, .(ID), function(one) with(one, y[length(y)] - y[1]))

# if you want a more informative variable name in the result
# return a named vector:
ddply(DF, .(ID), function(one) c(diff = with(one, y[length(y)] - y[1])))

# plyr takes care of labelling the result for you.

You don't say why you want to include x, or what to do if x is not
invariant, but here are couple of options:

# Split up by ID and x
ddply(DF, .(ID, x), function(one) c(diff = with(one, y[length(y)] - y[1])))

# Return the first x value
ddply(DF, .(ID), function(one) {
  with(one, c(
    x = x[1],
    diff = y[length(y)] - y[1]
  ))
})

# Throw an error is x is not unique

ddply(DF, .(ID), function(one) {
  stopifnot(length(unique(one$x)) == 1)
  with(one, c(
    x = x[1],
    diff = y[length(y)] - y[1]
  ))
})

Regards,

Hadley
http://had.co.nz/
Here is a fast approach using the Hmisc package's summarize function.

 > g <- function(w) {
+  time <- w[,'time']; y <- w[,'y']
+  c(y[which.min(time)], y[which.max(time)])}
 >
 > with(DF, summarize(DF, ID, g, stat.name=c('first','last')))
   ID first last
1  1    20   40
2  2    23   38
3  3    10   15

summarize converts DF to a matrix for speed, as subscripting matrices is 
much faster than subscripting data frames.

Frank
On Fri, Jan 2, 2009 at 3:20 AM, gallon li <gallon.li at gmail.com> wrote:
I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?
One approach is to use the plyr package, as documented at
http://had.co.nz/plyr.  The basic idea is that your problem is easy to
solve if you have a subset for a single subject value:

one <- subset(DF, ID == 1)
with(one, y[length(y)] - y[1])

The difficulty is splitting up the original dataset in to subjects,
applying the solution to each piece and then joining all the results
back together.  This is what the plyr package does for you:

library(plyr)

# ddply is for splitting up data frames and combining the results
# into a data frame.  .(ID) says to split up the data frame by the subject
# variable
ddply(DF, .(ID), function(one) with(one, y[length(y)] - y[1]))

# if you want a more informative variable name in the result
# return a named vector:
ddply(DF, .(ID), function(one) c(diff = with(one, y[length(y)] - y[1])))

# plyr takes care of labelling the result for you.

You don't say why you want to include x, or what to do if x is not
invariant, but here are couple of options:

# Split up by ID and x
ddply(DF, .(ID, x), function(one) c(diff = with(one, y[length(y)] - y[1])))

# Return the first x value
ddply(DF, .(ID), function(one) {
  with(one, c(
    x = x[1],
    diff = y[length(y)] - y[1]
  ))
})

# Throw an error is x is not unique

ddply(DF, .(ID), function(one) {
  stopifnot(length(unique(one$x)) == 1)
  with(one, c(
    x = x[1],
    diff = y[length(y)] - y[1]
  ))
})

Regards,

Hadley

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University
I think there's a pretty simple solution here, though probably not the
most efficient:

t(sapply(split(a,a$ID),
    function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y)))))

Using 'unique' instead of min or [[1]] has the advantage that if x is
in fact not time-invariant, this gives an error rather than silently
ignore inconsistencies.

Trying to package up this idiom into a function leads to:

select <-
  function(df, groupby, selection)
   {
     pf <- parent.frame()
     fields <- substitute(selection)
     t(sapply(split(df,eval(substitute(groupby),df,enclos=pf)),
             function(q) eval(fields,q,enclos=pf)))  }

which I admit is rather ugly (and does no error-checking), but it does work:
select(a,ID,list(min(ID),unique(x),max(y)-min(y)))
[,1] [,2] [,3]
  1 1    10   20
  2 2    12   15
  3 3    5    5

Perhaps some of the more experienced people on the list could show me
how to write this more cleanly.

           -s
I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Is is truly

y=max(y)-min(y)

what you want below?

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com
I think there's a pretty simple solution here, though probably not the
most efficient:

t(sapply(split(a,a$ID),
    function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y)))))

Using 'unique' instead of min or [[1]] has the advantage that if x is
in fact not time-invariant, this gives an error rather than silently
ignore inconsistencies.

Trying to package up this idiom into a function leads to:

select <-
  function(df, groupby, selection)
   {
     pf <- parent.frame()
     fields <- substitute(selection)
     t(sapply(split(df,eval(substitute(groupby),df,enclos=pf)),
             function(q) eval(fields,q,enclos=pf)))  }

which I admit is rather ugly (and does no error-checking), but it does work:

select(a,ID,list(min(ID),unique(x),max(y)-min(y)))
    [,1] [,2] [,3]
  1 1    10   20
  2 2    12   15
  3 3    5    5

Perhaps some of the more experienced people on the list could show me
how to write this more cleanly.

           -s

On Fri, Jan 2, 2009 at 4:20 AM, gallon li <gallon.li at gmail.com> wrote:
I have the following data

ID x y time
1  10 20 0
1  10 30 1
1 10 40 2
2 12 23 0
2 12 25 1
2 12 28 2
2 12 38 3
3 5 10 0
3 5 15 2
.....

x is time invariant, ID is the subject id number, y is changing over time.

I want to find out the difference between the first and last observed y
value for each subject and get a table like

ID x y
1 10 20
2 12 15
3 5 5
......

Is there any easy way to generate the data set?

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.