Skip to content
Back to formatted view

Raw Message

Message-ID: <BANLkTimG=G8RexOHsqmaEGumAjauhresOA@mail.gmail.com>
Date: 2011-04-20T21:22:59Z
From: Ista Zahn
Subject: Can R replicate this data manipulation in SAS?
In-Reply-To: <XFMail.110420195928.ted.harding@wlandres.net>

I think this is kind of like asking "will your Land Rover make it up
my driveway?", but I'll assume the question was asked in all
seriousness.

Here is one solution:

## **** Read in test data;
dat <- read.table(textConnection("id    drug      start       stop
1004    NRTI     07/24/95    01/05/99
1004    NRTI     11/20/95 12/10/95
1004    NRTI     01/10/96    01/05/99
1004    PI       05/09/96    11/16/97
1004    NRTI     06/01/96    02/01/97
1004    NRTI     07/01/96    03/01/97
9999    PI       01/02/03    NA
9999    NNRTI    04/05/06    07/08/09"), header=TRUE)
closeAllConnections()

dat$start <- as.Date(dat$start, format = "%m/%d/%y")
dat$stop <- as.Date(dat$stop, format = "%m/%d/%y")

## **** Reshape data into series with 1 date rather than separate starts and
## stops;

library(reshape)

m.dat <- melt(dat, id = c("id", "drug"))
m.dat <- m.dat[order(m.dat$id, m.dat$value),]
m.dat$variable <- ifelse(m.dat$variable == "start", 1, -1)
names(m.dat) <-  c("id", "drug", "value", "date")
m.dat

## **** Get regimen information plus start and stop dates;

n.dat <- cast(m.dat, id + date ~ drug, fun.aggregate=sum, margins="grand_col")
for (i in names(n.dat)[-c(1:2)]) {
     n.dat[i] <- cumsum(n.dat[i])
   }
n.dat <- ddply(n.dat, .(id), transform,
      regimen = 1:length(id))
n.dat

ssd.dat <- ddply(n.dat, .(id), summarize,
                id = id[-1],
                regimen = regimen[-length(regimen)],
                 start_date = date[-length(date)],
                stop_date = date[-1])
ssd.dat

## **** Merge data to create regimens dataset;
all.dat <- merge(n.dat[-2], ssd.dat)
all.dat <- all.dat[order(all.dat$id, all.dat$regimen), c("id",
"start_date", "stop_date", "regimen", "NRTI", "NNRTI", "PI",
"X.all.")]
all.dat


Best,
Ista



On Wed, Apr 20, 2011 at 2:59 PM, Ted Harding <ted.harding at wlandres.net> wrote:
> [*** PLEASE NOTE: I am sending this message on behalf of
> ?Paul Miller:
> ?Paul Miller <pjmiller_57 at yahoo.com>
> ?(to whom this message has also been copied). He has been
> ?trying to send it, but it has never got through. Please
> ?do ?not reply to me, but either to the list and/or to Paul
> ?at that address ***]
> ==========================================================
> Hello Everyone,
>
> I'm learning R and am trying to get a better sense of what it will and
> will not
> do. I'm hearing in some places that R may not be able to accomplish all
> of the
> data manipulation tasks that SAS can. In others, I'm hearing that R can do
> pretty much any data manipulation that SAS can but the way in which it
> does so
> is likely to be quite different.
>
> Below is some SAS syntax that that codes Highly Active Antiretroviral
> Therapy
> (HAART) regimens in HIV patients by retaining the values of variables.
> Interspersed between the bits of code are printouts of data sets that are
> created in the process of coding. I'm hoping this will come through
> clearly and
> that people will be able to see exactly what is being done. Basically,
> the code
> keeps track of how many drugs people are on and what types of drugs they
> are
> taking during specific periods of time and decides whether that
> constitutes
> HAART or not.
>
> To me, this is a pretty tricky data manipulation in SAS. Is there any way
> to
> get the equivalent result in R?
>
> Thanks,
>
> Paul
>
>
> **** SAS syntax for coding HAART in HIV patients;
> **** Read in test data;
>
> data haart;
> input id drug_class $ start_date :mmddyy. stop_date :mmddyy.;
> format start_date stop_date mmddyy8.;
> cards;
> 1004 NRTI ?07/24/95 01/05/99
> 1004 NRTI ?11/20/95 12/10/95
> 1004 NRTI ?01/10/96 01/05/99
> 1004 PI ? ?05/09/96 11/16/97
> 1004 NRTI ?06/01/96 02/01/97
> 1004 NRTI ?07/01/96 03/01/97
> 9999 PI ? ?01/02/03 .
> 9999 NNRTI 04/05/06 07/08/09
> ;
> run;
>
> proc print data=haart;
> run;
>
> ? ? ? ? ? ? ? drug_ ? ? ?start_ ? ? ? stop_
> Obs ? ? id ? ? class ? ? ? ?date ? ? ? ?date
> 1 ? ? 1004 ? ?NRTI ? ? 07/24/95 ? ?01/05/99
> 2 ? ? 1004 ? ?NRTI ? ? 11/20/95 12/10/95
> 3 ? ? 1004 ? ?NRTI ? ? 01/10/96 ? ?01/05/99
> 4 ? ? 1004 ? ?PI ? ? ? 05/09/96 ? ?11/16/97
> 5 ? ? 1004 ? ?NRTI ? ? 06/01/96 ? ?02/01/97
> 6 ? ? 1004 ? ?NRTI ? ? 07/01/96 ? ?03/01/97
> 7 ? ? 9999 ? ?PI ? ? ? 01/02/03 ? ? ? ? ? .
> 8 ? ? 9999 ? ?NNRTI ? ?04/05/06 ? ?07/08/09
>
> **** Reshape data into series with 1 date rather than separate starts and
> stops;
>
> data changes (drop=start_date stop_date where=(not missing(date)));
> set haart;
> date = start_date;
> change = ?1;
> output;
> date = ?stop_date;
> change = -1;
> output;
> format date mmddyy10.;
> run;
>
> proc sort data=changes;
> by id date;
> run;
>
> proc print data=changes;
> run;
>
> ? ? ? ? ? ? ? drug_
> Obs ? ? id ? ? class ? ? ? ? ?date ? ?change
> ?1 ? ?1004 ? ?NRTI ? ? 07/24/1995 ? ? ? 1
> ?2 ? ?1004 ? ?NRTI ? ? 11/20/1995 ? ? ? 1
> ?3 ? ?1004 ? ?NRTI ? ? 12/10/1995 ? ? ?-1
> ?4 ? ?1004 ? ?NRTI ? ? 01/10/1996 ? ? ? 1
> ?5 ? ?1004 ? ?PI ? ? ? 05/09/1996 ? ? ? 1
> ?6 ? ?1004 ? ?NRTI ? ? 06/01/1996 ? ? ? 1
> ?7 ? ?1004 ? ?NRTI ? ? 07/01/1996 ? ? ? 1
> ?8 ? ?1004 ? ?NRTI ? ? 02/01/1997 ? ? ?-1
> ?9 ? ?1004 ? ?NRTI ? ? 03/01/1997 ? ? ?-1
> 10 ? ?1004 ? ?PI ? ? ? 11/16/1997 ? ? ?-1
> 11 ? ?1004 ? ?NRTI ? ? 01/05/1999 ? ? ?-1
> 12 ? ?1004 ? ?NRTI ? ? 01/05/1999 ? ? ?-1
> 13 ? ?9999 ? ?PI ? ? ? 01/02/2003 ? ? ? 1
> 14 ? ?9999 ? ?NNRTI ? ?04/05/2006 ? ? ? 1
> 15 ? ?9999 ? ?NNRTI ? ?07/08/2009 ? ? ?-1
>
> **** Get regimen information plus start and stop dates;
>
> data cumulative(drop=drug_class change stop_date)
> ? ? stop_dates(keep=id regimen stop_date);
> set changes;
> by id date;
>
> if first.id then do;
> ?regimen = 0;
> ?NRTI = 0;
> ?NNRTI = 0;
> ?PI = 0;
> end;
>
> if drug_class = 'NNRTI' then NNRTI + change;
> else if drug_class = 'NRTI' then NRTI + change;
> else if drug_class = 'PI ?' then PI + change;
>
> if last.date then do;
> ?stop_date = date - 1;
> if regimen then output stop_dates;
> ? regimen + 1;
> ?alldrugs = NNRTI + NRTI + PI;
> ?HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR
> ? ?(NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
> ? ?(NRTI = 1 AND NNRTI >= 1 AND PI >= 1);
> output cumulative;
> end;
>
> format stop_date mmddyy10.;
> run;
>
> proc print data=cumulative;
> run;
> Obs ? ? id ? ? ? ? ? date ? ?regimen ? ?NRTI ? ?NNRTI ? ?PI ? ?alldrugs
> ?HAART
> ?1 ? ?1004 ? ?07/24/1995 ? ? ? ?1 ? ? ? ?1 ? ? ? 0 ? ? ? 0 ? ? ? ?1
> ? 0
> ?2 ? ?1004 ? ?11/20/1995 ? ? ? ?2 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ? ? ? ?2
> ? 0
> ?3 ? ?1004 ? ?12/10/1995 ? ? ? ?3 ? ? ? ?1 ? ? ? 0 ? ? ? 0 ? ? ? ?1
> ? 0
> ?4 ? ?1004 ? ?01/10/1996 ? ? ? ?4 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ? ? ? ?2
> ? 0
> ?5 ? ?1004 ? ?05/09/1996 ? ? ? ?5 ? ? ? ?2 ? ? ? 0 ? ? ? 1 ? ? ? ?3
> ? 1
> ?6 ? ?1004 ? ?06/01/1996 ? ? ? ?6 ? ? ? ?3 ? ? ? 0 ? ? ? 1 ? ? ? ?4
> ? 1
> ?7 ? ?1004 ? ?07/01/1996 ? ? ? ?7 ? ? ? ?4 ? ? ? 0 ? ? ? 1 ? ? ? ?5
> ? 1
> ?8 ? ?1004 ? ?02/01/1997 ? ? ? ?8 ? ? ? ?3 ? ? ? 0 ? ? ? 1 ? ? ? ?4
> ? 1
> ?9 ? ?1004 ? ?03/01/1997 ? ? ? ?9 ? ? ? ?2 ? ? ? 0 ? ? ? 1 ? ? ? ?3
> ? 1
> 10 ? ?1004 ? ?11/16/1997 ? ? ? 10 ? ? ? ?2 ? ? ? 0 ? ? ? 0 ? ? ? ?2
> ?0
> 11 ? ?1004 ? ?01/05/1999 ? ? ? 11 ? ? ? ?0 ? ? ? 0 ? ? ? 0 ? ? ? ?0
> ?0
> 12 ? ?9999 ? ?01/02/2003 ? ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 1 ? ? ? ?1
> ?0
> 13 ? ?9999 ? ?04/05/2006 ? ? ? ?2 ? ? ? ?0 ? ? ? 1 ? ? ? 1 ? ? ? ?2
> ?0
> 14 ? ?9999 ? ?07/08/2009 ? ? ? ?3 ? ? ? ?0 ? ? ? 0 ? ? ? 1 ? ? ? ?1
> ?0
>
> proc print data=stop_dates;
> run;
>
> Obs ? ? id ? ? regimen ? ? stop_date
> ?1 ? ?1004 ? ? ? ?1 ? ? ?11/19/1995
> ?2 ? ?1004 ? ? ? ?2 ? ? ?12/09/1995
> ?3 ? ?1004 ? ? ? ?3 ? ? ?01/09/1996
> ?4 ? ?1004 ? ? ? ?4 ? ? ?05/08/1996
> ?5 ? ?1004 ? ? ? ?5 ? ? ?05/31/1996
> ?6 ? ?1004 ? ? ? ?6 ? ? ?06/30/1996
> ?7 ? ?1004 ? ? ? ?7 ? ? ?01/31/1997
> ?8 ? ?1004 ? ? ? ?8 ? ? ?02/28/1997
> ?9 ? ?1004 ? ? ? ?9 ? ? ?11/15/1997
> 10 ? ?1004 ? ? ? 10 ? ? ?01/04/1999
> 11 ? ?9999 ? ? ? ?1 ? ? ?04/04/2006
> 12 ? ?9999 ? ? ? ?2 ? ? ?07/07/2009
>
> **** Merge data to create regimens dataset;
>
> data regimens;
> retain id start_date stop_date;
> merge cumulative(rename=(date=start_date)) stop_dates;
> by id regimen;
> if alldrugs;
> run;
>
> proc print data=regimens;
> run;
>
> Obs ? ? id ? ? start_date ? ? stop_date ? ?regimen ? ?NRTI ? ?NNRTI ? ?PI
>
> alldrugs ? ?HAART
> ?1 ? ?1004 ? ?07/24/1995 ? ?11/19/1995 ? ? ? ?1 ? ? ? ?1 ? ? ? 0 ? ? ? 0
>
> ?1 ? ? ? ? 0
> ?2 ? ?1004 ? ?11/20/1995 ? ?12/09/1995 ? ? ? ?2 ? ? ? ?2 ? ? ? 0 ? ? ? 0
>
> ?2 ? ? ? ? 0
> ?3 ? ?1004 ? ?12/10/1995 ? ?01/09/1996 ? ? ? ?3 ? ? ? ?1 ? ? ? 0 ? ? ? 0
>
> ?1 ? ? ? ? 0
> ?4 ? ?1004 ? ?01/10/1996 ? ?05/08/1996 ? ? ? ?4 ? ? ? ?2 ? ? ? 0 ? ? ? 0
>
> ?2 ? ? ? ? 0
> ?5 ? ?1004 ? ?05/09/1996 ? ?05/31/1996 ? ? ? ?5 ? ? ? ?2 ? ? ? 0 ? ? ? 1
>
> ?3 ? ? ? ? 1
> ?6 ? ?1004 ? ?06/01/1996 ? ?06/30/1996 ? ? ? ?6 ? ? ? ?3 ? ? ? 0 ? ? ? 1
>
> ?4 ? ? ? ? 1
> ?7 ? ?1004 ? ?07/01/1996 ? ?01/31/1997 ? ? ? ?7 ? ? ? ?4 ? ? ? 0 ? ? ? 1
>
> ?5 ? ? ? ? 1
> ?8 ? ?1004 ? ?02/01/1997 ? ?02/28/1997 ? ? ? ?8 ? ? ? ?3 ? ? ? 0 ? ? ? 1
>
> ?4 ? ? ? ? 1
> ?9 ? ?1004 ? ?03/01/1997 ? ?11/15/1997 ? ? ? ?9 ? ? ? ?2 ? ? ? 0 ? ? ? 1
>
> ?3 ? ? ? ? 1
> 10 ? ?1004 ? ?11/16/1997 ? ?01/04/1999 ? ? ? 10 ? ? ? ?2 ? ? ? 0 ? ? ? 0
>
> 2 ? ? ? ? 0
> 11 ? ?9999 ? ?01/02/2003 ? ?04/04/2006 ? ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 1
>
> 1 ? ? ? ? 0
> 12 ? ?9999 ? ?04/05/2006 ? ?07/07/2009 ? ? ? ?2 ? ? ? ?0 ? ? ? 1 ? ? ? 1
>
> 2 ? ? ? ? 0
> 13 ? ?9999 ? ?07/08/2009 ? ? ? ? ? ? . ? ? ? ?3 ? ? ? ?0 ? ? ? 0 ? ? ? 1
>
> 1 ? ? ? ? 0
>
> ==========================================================
>
> Paul Miller
> Paul Miller <pjmiller_57 at yahoo.com>
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 20-Apr-11 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Time: 19:59:21
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org