R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
On 11/27/24 08:30, Sorkin, John wrote:
I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement:
I want to take data (olddata) that looks like this
ID Day
1 1
1 1
1 2
1 2
1 3
1 3
1 4
1 4
1 5
1 5
2 5
2 5
2 5
2 6
2 6
2 6
3 10
3 10
and make it look like this:
(withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID:
ID Day FirstDay
1 1 1
1 1 1
1 2 1
1 2 1
1 3 1
1 3 1
1 4 1
1 4 1
1 5 1
1 5 1
2 5 5
2 5 5
2 5 5
2 6 5
2 6 5
2 6 5
3 10 3
3 10 3
SAS code that can do this is:
proc sort data=olddata;
by ID Day;
run;
data newdata;
retain FirstDay;
set olddata;
by ID;
if first.ID then FirstDay=Day;
run;
I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata:
ID <- c(rep(1,10),rep(2,6),rep(3,2))
date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
rep(5,3),rep(6,3),rep(10,2))
date
olddata <- data.frame(ID=ID,date=date)
olddata
Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . .
There's an R base function named, wait for it, ... `by`
It returns a list? that is the results of a function applied to the
sub-dataframes indexed by whatever grouping variable you specify in the
second argument. My memory told me that it needed to be presented as a
list which was why I chose to use the `[` function rather than `$` or `[[`
by(olddata, olddata["ID"], FUN= function(x) { rep( x$ID[1],
times=nrow(x) )}) #------------------- ID: 1 [1] 1 1 1 1 1 1 1 1 1 1
------------------------------------------------------------------------------------
ID: 2 [1] 2 2 2 2 2 2
------------------------------------------------------------------------------------
ID: 3 [1] 3 3 So all you need to do from there is unlist it and assign
to the new named column #------------------ olddata$FirstDay <- unlist(
by(olddata, olddata["ID"], FUN= function(x) { rep( x$ID[1],
times=nrow(x) )}) ) olddata #---------------------------- ID date
FirstDay 1 1 1 1 2 1 1 1 3 1 2 1 4 1 2 1 5 1 3 1 6 1 3 1 7 1 4 1 8 1 4 1
9 1 5 1 10 1 5 1 11 2 5 2 12 2 5 2 13 2 5 2 14 2 6 2 15 2 6 2 16 2 6 2
17 3 10 3 18 3 10 3
HTH
David.
Thanks John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttps://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.