Can R replicate this data manipulation in SAS?
I think this is kind of like asking "will your Land Rover make it up
my driveway?", but I'll assume the question was asked in all
seriousness.
Here is one solution:
## **** Read in test data;
dat <- read.table(textConnection("id drug start stop
1004 NRTI 07/24/95 01/05/99
1004 NRTI 11/20/95 12/10/95
1004 NRTI 01/10/96 01/05/99
1004 PI 05/09/96 11/16/97
1004 NRTI 06/01/96 02/01/97
1004 NRTI 07/01/96 03/01/97
9999 PI 01/02/03 NA
9999 NNRTI 04/05/06 07/08/09"), header=TRUE)
closeAllConnections()
dat$start <- as.Date(dat$start, format = "%m/%d/%y")
dat$stop <- as.Date(dat$stop, format = "%m/%d/%y")
## **** Reshape data into series with 1 date rather than separate starts and
## stops;
library(reshape)
m.dat <- melt(dat, id = c("id", "drug"))
m.dat <- m.dat[order(m.dat$id, m.dat$value),]
m.dat$variable <- ifelse(m.dat$variable == "start", 1, -1)
names(m.dat) <- c("id", "drug", "value", "date")
m.dat
## **** Get regimen information plus start and stop dates;
n.dat <- cast(m.dat, id + date ~ drug, fun.aggregate=sum, margins="grand_col")
for (i in names(n.dat)[-c(1:2)]) {
n.dat[i] <- cumsum(n.dat[i])
}
n.dat <- ddply(n.dat, .(id), transform,
regimen = 1:length(id))
n.dat
ssd.dat <- ddply(n.dat, .(id), summarize,
id = id[-1],
regimen = regimen[-length(regimen)],
start_date = date[-length(date)],
stop_date = date[-1])
ssd.dat
## **** Merge data to create regimens dataset;
all.dat <- merge(n.dat[-2], ssd.dat)
all.dat <- all.dat[order(all.dat$id, all.dat$regimen), c("id",
"start_date", "stop_date", "regimen", "NRTI", "NNRTI", "PI",
"X.all.")]
all.dat
Best,
Ista
On Wed, Apr 20, 2011 at 2:59 PM, Ted Harding <ted.harding at wlandres.net> wrote:
[*** PLEASE NOTE: I am sending this message on behalf of ?Paul Miller: ?Paul Miller <pjmiller_57 at yahoo.com> ?(to whom this message has also been copied). He has been ?trying to send it, but it has never got through. Please ?do ?not reply to me, but either to the list and/or to Paul ?at that address ***] ========================================================== Hello Everyone, I'm learning R and am trying to get a better sense of what it will and will not do. I'm hearing in some places that R may not be able to accomplish all of the data manipulation tasks that SAS can. In others, I'm hearing that R can do pretty much any data manipulation that SAS can but the way in which it does so is likely to be quite different. Below is some SAS syntax that that codes Highly Active Antiretroviral Therapy (HAART) regimens in HIV patients by retaining the values of variables. Interspersed between the bits of code are printouts of data sets that are created in the process of coding. I'm hoping this will come through clearly and that people will be able to see exactly what is being done. Basically, the code keeps track of how many drugs people are on and what types of drugs they are taking during specific periods of time and decides whether that constitutes HAART or not. To me, this is a pretty tricky data manipulation in SAS. Is there any way to get the equivalent result in R? Thanks, Paul **** SAS syntax for coding HAART in HIV patients; **** Read in test data; data haart; input id drug_class $ start_date :mmddyy. stop_date :mmddyy.; format start_date stop_date mmddyy8.; cards; 1004 NRTI ?07/24/95 01/05/99 1004 NRTI ?11/20/95 12/10/95 1004 NRTI ?01/10/96 01/05/99 1004 PI ? ?05/09/96 11/16/97 1004 NRTI ?06/01/96 02/01/97 1004 NRTI ?07/01/96 03/01/97 9999 PI ? ?01/02/03 . 9999 NNRTI 04/05/06 07/08/09 ; run; proc print data=haart; run; ? ? ? ? ? ? ? drug_ ? ? ?start_ ? ? ? stop_ Obs ? ? id ? ? class ? ? ? ?date ? ? ? ?date 1 ? ? 1004 ? ?NRTI ? ? 07/24/95 ? ?01/05/99 2 ? ? 1004 ? ?NRTI ? ? 11/20/95 12/10/95 3 ? ? 1004 ? ?NRTI ? ? 01/10/96 ? ?01/05/99 4 ? ? 1004 ? ?PI ? ? ? 05/09/96 ? ?11/16/97 5 ? ? 1004 ? ?NRTI ? ? 06/01/96 ? ?02/01/97 6 ? ? 1004 ? ?NRTI ? ? 07/01/96 ? ?03/01/97 7 ? ? 9999 ? ?PI ? ? ? 01/02/03 ? ? ? ? ? . 8 ? ? 9999 ? ?NNRTI ? ?04/05/06 ? ?07/08/09 **** Reshape data into series with 1 date rather than separate starts and stops; data changes (drop=start_date stop_date where=(not missing(date))); set haart; date = start_date; change = ?1; output; date = ?stop_date; change = -1; output; format date mmddyy10.; run; proc sort data=changes; by id date; run; proc print data=changes; run; ? ? ? ? ? ? ? drug_ Obs ? ? id ? ? class ? ? ? ? ?date ? ?change ?1 ? ?1004 ? ?NRTI ? ? 07/24/1995 ? ? ? 1 ?2 ? ?1004 ? ?NRTI ? ? 11/20/1995 ? ? ? 1 ?3 ? ?1004 ? ?NRTI ? ? 12/10/1995 ? ? ?-1 ?4 ? ?1004 ? ?NRTI ? ? 01/10/1996 ? ? ? 1 ?5 ? ?1004 ? ?PI ? ? ? 05/09/1996 ? ? ? 1 ?6 ? ?1004 ? ?NRTI ? ? 06/01/1996 ? ? ? 1 ?7 ? ?1004 ? ?NRTI ? ? 07/01/1996 ? ? ? 1 ?8 ? ?1004 ? ?NRTI ? ? 02/01/1997 ? ? ?-1 ?9 ? ?1004 ? ?NRTI ? ? 03/01/1997 ? ? ?-1 10 ? ?1004 ? ?PI ? ? ? 11/16/1997 ? ? ?-1 11 ? ?1004 ? ?NRTI ? ? 01/05/1999 ? ? ?-1 12 ? ?1004 ? ?NRTI ? ? 01/05/1999 ? ? ?-1 13 ? ?9999 ? ?PI ? ? ? 01/02/2003 ? ? ? 1 14 ? ?9999 ? ?NNRTI ? ?04/05/2006 ? ? ? 1 15 ? ?9999 ? ?NNRTI ? ?07/08/2009 ? ? ?-1 **** Get regimen information plus start and stop dates; data cumulative(drop=drug_class change stop_date) ? ? stop_dates(keep=id regimen stop_date); set changes; by id date; if first.id then do; ?regimen = 0; ?NRTI = 0; ?NNRTI = 0; ?PI = 0; end; if drug_class = 'NNRTI' then NNRTI + change; else if drug_class = 'NRTI' then NRTI + change; else if drug_class = 'PI ?' then PI + change; if last.date then do; ?stop_date = date - 1; if regimen then output stop_dates; ? regimen + 1; ?alldrugs = NNRTI + NRTI + PI; ?HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR ? ?(NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR ? ?(NRTI = 1 AND NNRTI >= 1 AND PI >= 1); output cumulative; end; format stop_date mmddyy10.; run; proc print data=cumulative; run; Obs ? ? id ? ? ? ? ? date ? ?regimen ? ?NRTI ? ?NNRTI ? ?PI ? ?alldrugs ?HAART ?1 ? ?1004 ? ?07/24/1995 ? ? ? ?1 ? ? ? ?1 ? ? ? 0 ? ? ? 0 ? ? ? ?1 ? 0 ?2 ? ?1004 ? ?11/20/1995 ? ? ? ?2 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ? ? ? ?2 ? 0 ?3 ? ?1004 ? ?12/10/1995 ? ? ? ?3 ? ? ? ?1 ? ? ? 0 ? ? ? 0 ? ? ? ?1 ? 0 ?4 ? ?1004 ? ?01/10/1996 ? ? ? ?4 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ? ? ? ?2 ? 0 ?5 ? ?1004 ? ?05/09/1996 ? ? ? ?5 ? ? ? ?2 ? ? ? 0 ? ? ? 1 ? ? ? ?3 ? 1 ?6 ? ?1004 ? ?06/01/1996 ? ? ? ?6 ? ? ? ?3 ? ? ? 0 ? ? ? 1 ? ? ? ?4 ? 1 ?7 ? ?1004 ? ?07/01/1996 ? ? ? ?7 ? ? ? ?4 ? ? ? 0 ? ? ? 1 ? ? ? ?5 ? 1 ?8 ? ?1004 ? ?02/01/1997 ? ? ? ?8 ? ? ? ?3 ? ? ? 0 ? ? ? 1 ? ? ? ?4 ? 1 ?9 ? ?1004 ? ?03/01/1997 ? ? ? ?9 ? ? ? ?2 ? ? ? 0 ? ? ? 1 ? ? ? ?3 ? 1 10 ? ?1004 ? ?11/16/1997 ? ? ? 10 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ? ? ? ?2 ?0 11 ? ?1004 ? ?01/05/1999 ? ? ? 11 ? ? ? ?0 ? ? ? 0 ? ? ? 0 ? ? ? ?0 ?0 12 ? ?9999 ? ?01/02/2003 ? ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 1 ? ? ? ?1 ?0 13 ? ?9999 ? ?04/05/2006 ? ? ? ?2 ? ? ? ?0 ? ? ? 1 ? ? ? 1 ? ? ? ?2 ?0 14 ? ?9999 ? ?07/08/2009 ? ? ? ?3 ? ? ? ?0 ? ? ? 0 ? ? ? 1 ? ? ? ?1 ?0 proc print data=stop_dates; run; Obs ? ? id ? ? regimen ? ? stop_date ?1 ? ?1004 ? ? ? ?1 ? ? ?11/19/1995 ?2 ? ?1004 ? ? ? ?2 ? ? ?12/09/1995 ?3 ? ?1004 ? ? ? ?3 ? ? ?01/09/1996 ?4 ? ?1004 ? ? ? ?4 ? ? ?05/08/1996 ?5 ? ?1004 ? ? ? ?5 ? ? ?05/31/1996 ?6 ? ?1004 ? ? ? ?6 ? ? ?06/30/1996 ?7 ? ?1004 ? ? ? ?7 ? ? ?01/31/1997 ?8 ? ?1004 ? ? ? ?8 ? ? ?02/28/1997 ?9 ? ?1004 ? ? ? ?9 ? ? ?11/15/1997 10 ? ?1004 ? ? ? 10 ? ? ?01/04/1999 11 ? ?9999 ? ? ? ?1 ? ? ?04/04/2006 12 ? ?9999 ? ? ? ?2 ? ? ?07/07/2009 **** Merge data to create regimens dataset; data regimens; retain id start_date stop_date; merge cumulative(rename=(date=start_date)) stop_dates; by id regimen; if alldrugs; run; proc print data=regimens; run; Obs ? ? id ? ? start_date ? ? stop_date ? ?regimen ? ?NRTI ? ?NNRTI ? ?PI alldrugs ? ?HAART ?1 ? ?1004 ? ?07/24/1995 ? ?11/19/1995 ? ? ? ?1 ? ? ? ?1 ? ? ? 0 ? ? ? 0 ?1 ? ? ? ? 0 ?2 ? ?1004 ? ?11/20/1995 ? ?12/09/1995 ? ? ? ?2 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ?2 ? ? ? ? 0 ?3 ? ?1004 ? ?12/10/1995 ? ?01/09/1996 ? ? ? ?3 ? ? ? ?1 ? ? ? 0 ? ? ? 0 ?1 ? ? ? ? 0 ?4 ? ?1004 ? ?01/10/1996 ? ?05/08/1996 ? ? ? ?4 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ?2 ? ? ? ? 0 ?5 ? ?1004 ? ?05/09/1996 ? ?05/31/1996 ? ? ? ?5 ? ? ? ?2 ? ? ? 0 ? ? ? 1 ?3 ? ? ? ? 1 ?6 ? ?1004 ? ?06/01/1996 ? ?06/30/1996 ? ? ? ?6 ? ? ? ?3 ? ? ? 0 ? ? ? 1 ?4 ? ? ? ? 1 ?7 ? ?1004 ? ?07/01/1996 ? ?01/31/1997 ? ? ? ?7 ? ? ? ?4 ? ? ? 0 ? ? ? 1 ?5 ? ? ? ? 1 ?8 ? ?1004 ? ?02/01/1997 ? ?02/28/1997 ? ? ? ?8 ? ? ? ?3 ? ? ? 0 ? ? ? 1 ?4 ? ? ? ? 1 ?9 ? ?1004 ? ?03/01/1997 ? ?11/15/1997 ? ? ? ?9 ? ? ? ?2 ? ? ? 0 ? ? ? 1 ?3 ? ? ? ? 1 10 ? ?1004 ? ?11/16/1997 ? ?01/04/1999 ? ? ? 10 ? ? ? ?2 ? ? ? 0 ? ? ? 0 2 ? ? ? ? 0 11 ? ?9999 ? ?01/02/2003 ? ?04/04/2006 ? ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 1 1 ? ? ? ? 0 12 ? ?9999 ? ?04/05/2006 ? ?07/07/2009 ? ? ? ?2 ? ? ? ?0 ? ? ? 1 ? ? ? 1 2 ? ? ? ? 0 13 ? ?9999 ? ?07/08/2009 ? ? ? ? ? ? . ? ? ? ?3 ? ? ? ?0 ? ? ? 0 ? ? ? 1 1 ? ? ? ? 0 ========================================================== Paul Miller Paul Miller <pjmiller_57 at yahoo.com> -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 20-Apr-11 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Time: 19:59:21 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org