Skip to content

regression on data subsets in datafile

3 messages · marcel, Dennis Murphy, Gabor Grothendieck

#
I have data of the form

tC <- textConnection("
Subject	Date	parameter1
bob	3/2/99	10
bob	4/2/99	10
bob	5/5/99	10
bob	6/27/99	NA
bob	8/35/01	10
bob	3/2/02	10
steve	1/2/99	4
steve	2/2/00	7
steve	3/2/01	10
steve	4/2/02	NA
steve	5/2/03	16
kevin	6/5/04	24
")
data <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

I am trying to calculate rate of change of parameter1 in units/day for each
person. I think I need something like:
"lapply(split(mydata, mydata$ppt), function(x) lm(parameter1 ~ day,
data=x))"

I am not sure how to handle the dates in order to have the first day for
each person be time = 0, and the remaining dates to be handled as days since
time 0. Also, is there a way to add the resulting slopes to the data set as
a new column? 

Thanks,
Marcel 

--
View this message in context: http://r.789695.n4.nabble.com/regression-on-data-subsets-in-datafile-tp3806743p3806743.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi:

Here's one approach:

# date typo fixed in record 5 - changed 35 to 5
tC <- textConnection("
Subject Date    parameter1
bob     3/2/99  10
bob     4/2/99  10
bob     5/5/99  10
bob     6/27/99 NA
bob     8/5/01 10
bob     3/2/02  10
steve   1/2/99  4
steve   2/2/00  7
steve   3/2/01  10
steve   4/2/02  NA
steve   5/2/03  16
kevin   6/5/04  24
")
dat <- read.table(tC, header=TRUE, stringsAsFactors = FALSE)
close.connection(tC)
rm(tC)
# Convert Date to an object of class Date
dat <- transform(dat, date = as.Date(Date, format = '%m/%d/%y'))

# You could do this with transform() and the by() function, but
# here is another way to use the min date per person as time 0
# using package plyr; mutate is a faster alternative to transform
# and can be used for groupwise operations inside of ddply():
library('plyr')
dat <- ddply(dat, .(Subject), mutate, days = as.numeric(date - min(date)))

# Since Kevin has one record, want to return NAs for his coefficients
# The function f returns NA if there are less than three observations
# per subgroup; you can change 3 to 2 if you like. Otherwise, it returns
# the coefficients of the least squares line as a data frame.

f <- function(d) {
   if(nrow(d) < 3) {return(data.frame(intercept = NA, slope = NA))
     } else {
       p <-  coef(lm(parameter1 ~ days, data = d))
       data.frame(intercept = p[1], slope = p[2])
         }
   }
# Apply the function to each person's sub-data frame
ddply(dat, .(Subject), f)
  Subject intercept       slope
1     bob 10.000000 0.000000000
2   kevin        NA          NA
3   steve  3.998485 0.007591638

Another option is to use the lmList() function in the nlme package.

HTH,
Dennis
On Mon, Sep 12, 2011 at 12:42 AM, marcel <marcelcurlin at gmail.com> wrote:
#
On Mon, Sep 12, 2011 at 3:42 AM, marcel <marcelcurlin at gmail.com> wrote:
Try this:

data$Date <- as.Date(data$Date, "%m/%d/%y")
fm <- lm(parameter1 ~ Subject / Date - 1, data)
coef(fm)