Skip to content
Back to formatted view

Raw Message

Message-ID: <AANLkTimqB4wh9sNJhs1-MSsSvPCVB=c0j1-M8A7XxJQr@mail.gmail.com>
Date: 2011-01-16T15:04:07Z
From: Gabor Grothendieck
Subject: data prep question
In-Reply-To: <E67DC0C6-9E61-48B1-9410-36EFB0E5D146@gmail.com>

On Sat, Jan 15, 2011 at 4:26 PM, Matthew Strother <rstrothe at gmail.com> wrote:
> I have a data set with several thousand observations across time, grouped by subject (example format below)
>
> ID ? ? ? ? ? ? ?TIME ? ?OBS
> 001 ? ? ? ? ? ? 2200 ? ?23
> 001 ? ? ? ? ? ? 2400 ? ?11
> 001 ? ? ? ? ? ? 3200 ? ?10
> 001 ? ? ? ? ? ? 4500 ? ?22
> 003 ? ? ? ? ? ? 3900 ? ?45
> 003 ? ? ? ? ? ? 5605 ? ?32
> 005 ? ? ? ? ? ? 1800 ? ?56
> 005 ? ? ? ? ? ? 1900 ? ?34
> 005 ? ? ? ? ? ? 2300 ? ?23
> ...
>
> I would like to identify the first time for each subject, and then subtract this value from each subsequent time. ?However, the number of observations per subject varies widely (from 1 to 20), and the intervals between times varies widely. ? Is there a package that can help do this, or a loop that can be set up to evaluate ID, then calculate the values? ?The outcome I would like is presented below.
> ID ? ? ? ? ? ? ?TIME ? ?OBS
> 001 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 23
> 001 ? ? ? ? ? ? 200 ? ? ? ? ? ? 11
> 001 ? ? ? ? ? ? 1000 ? ?10
> 001 ? ? ? ? ? ? 2300 ? ?22
> 003 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 45
> 003 ? ? ? ? ? ? 1705 ? ?32
> 005 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 56
> 005 ? ? ? ? ? ? 100 ? ? ? ? ? ? 34
> 005 ? ? ? ? ? ? 500 ? ? ? ? ? ? 23

Since the data frame appears to be already sorted by time within ID we
can do this:

>  transform(DF, OBS = ave(OBS, ID, FUN = function(x) x - x[1]))
  ID TIME OBS
1  1 2200   0
2  1 2400 -12
3  1 3200 -13
4  1 4500  -1
5  3 3900   0
6  3 5605 -13
7  5 1800   0
8  5 1900 -22
9  5 2300 -33

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com