data prep question
On Sat, Jan 15, 2011 at 4:26 PM, Matthew Strother <rstrothe at gmail.com> wrote:
I have a data set with several thousand observations across time, grouped by subject (example format below) ID ? ? ? ? ? ? ?TIME ? ?OBS 001 ? ? ? ? ? ? 2200 ? ?23 001 ? ? ? ? ? ? 2400 ? ?11 001 ? ? ? ? ? ? 3200 ? ?10 001 ? ? ? ? ? ? 4500 ? ?22 003 ? ? ? ? ? ? 3900 ? ?45 003 ? ? ? ? ? ? 5605 ? ?32 005 ? ? ? ? ? ? 1800 ? ?56 005 ? ? ? ? ? ? 1900 ? ?34 005 ? ? ? ? ? ? 2300 ? ?23 ... I would like to identify the first time for each subject, and then subtract this value from each subsequent time. ?However, the number of observations per subject varies widely (from 1 to 20), and the intervals between times varies widely. ? Is there a package that can help do this, or a loop that can be set up to evaluate ID, then calculate the values? ?The outcome I would like is presented below. ID ? ? ? ? ? ? ?TIME ? ?OBS 001 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 23 001 ? ? ? ? ? ? 200 ? ? ? ? ? ? 11 001 ? ? ? ? ? ? 1000 ? ?10 001 ? ? ? ? ? ? 2300 ? ?22 003 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 45 003 ? ? ? ? ? ? 1705 ? ?32 005 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 56 005 ? ? ? ? ? ? 100 ? ? ? ? ? ? 34 005 ? ? ? ? ? ? 500 ? ? ? ? ? ? 23
Since the data frame appears to be already sorted by time within ID we can do this:
transform(DF, OBS = ave(OBS, ID, FUN = function(x) x - x[1]))
ID TIME OBS 1 1 2200 0 2 1 2400 -12 3 1 3200 -13 4 1 4500 -1 5 3 3900 0 6 3 5605 -13 7 5 1800 0 8 5 1900 -22 9 5 2300 -33
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com