long format - find age when another variable is first 'high'

Marc Schwartz · 2009-05-25T13:52:15Z

On May 25, 2009, at 7:45 AM, David Freedman wrote: > > Dear R, > > I've got a data frame with children examined multiple times and at > various > ages. I'm trying to find the first age at which another variable > (LDL-Cholesterol) is >= 130 mg/dL; for some children, this may never > happen. > I can do this with transformBy and ddply, but with 10,000 different > children, these functions take some time on my PCs - is there a > faster way > to do this in R? My code on a small dataset foll

Marc Schwartz

Mon, May 25, 2009 6:52 AM

On May 25, 2009, at 7:45 AM, David Freedman wrote:

The first thing that I would do is to get rid of records that are not  
relevant to your question:

 > d
id age ldlc high.ldlc
1  1   5  132         1
2  1  10  120         0
3  1  15  125         0
4  2   4  105         0
5  2   7  142         1
6  3  12  160         1


# Get records with high ldl
d.new <- subset(d, ldlc >= 130)


 > d.new
id age ldlc high.ldlc
1  1   5  132         1
5  2   7  142         1
6  3  12  160         1


That will help to reduce the total size of the dataset, perhaps  
substantially. It will also remove entire subjects that are not  
relevant (eg. never have LDL >= 130).

Then get the minimum age for each of the remaining subjects:

 > aggregate(d.new$age, list(id = d.new$id), min)
id  x
1  1  5
2  2  7
3  3 12


Try that to see what sort of time reduction you observe.

HTH,

Marc Schwartz

long format - find age when another variable is first 'high'

Thread (5 messages)