Skip to content

Noobie question, regression across levels

6 messages · RichardLang, AllenL, Ben Bolker

#
RichardLang wrote:
You can check out lmList function in the nlme package, or more crudely:

lmfun <- function(d) { lm(y~x,data=d) }
myLmList <- lapply(split(mydata,splitfactor),lmfun)

even more compactly/confusingly:

myLmList <- lapply(split(mydata,splitfactor),lm,formula=y~x)

  good luck
   Ben Bolker
1 day later
#
Much thanks! This helped a lot. Another quick one:
In using the lmList function in the nlme package, is it possible to subset
my data according to the number of observations in each level? (ie. I
obviously want to include only those levels in which the observations are of
sufficient size for regression). What is the best way to exclude factors of
insufficient size? Can I do it inside the lmList function? I've read the
requisite help files etc. and two hours later am still confused.
Thanks in advance,
Allen
Ben Bolker wrote:

  
    
#
AllenL wrote:
Don't know if you can do it directly in lmList, but:

splitdat <- split(mydata,splitfactor)
lengths <- sapply(splitdat,nrow)  ## NOT sapply(splitdat,length)
splitdat <- splitdat[lengths>minlength]
lmfun <- function(d) { lm(y~x,data=d) }
myLmList <- lapply(splitdat,lmfun) 

OR

lengths <- sapply(split(mydata,splitfactor),nrow)
badlevels <- levels(splitfactor)[lengths<minlength]
subdata <- subset(mydata,!splitfactor %in% badlevels)

  and then proceed with lmList

  of course, I didn't test any of this ...

 Ben Bolker
#
NB: Not reply needed (Ben was extremely helpful!)

I've just started using R last week and am still scratching my head.

I have a data set and want to run a separate regression across each level of
a factor (treating each one separately). The data right now is arranged such
that the value of the factor along which I want to "split" my data is one
column among many.
Best way to do this?

Thanks!
#
This is what I ended up using:

Data.subset<-data.frame(PlotFinal,YearFinal,BioFinal,ritFinal)     ###Subset
of main dataset
Data.length<-sapply(split(Data.subset,PlotFinal),nrow)       ### The number
of data points in each plot 

Data.sub<-split(Data.subset,PlotFinal)         ##Split Data.subset by
PlotFinal
Good.levels<-Data.sub[Data.length>10]       ##Take only those Plots that
have >10 years of observation

lmfun<-function(Good.levels)
{lm(ritFinal~BioFinal+YearFinal,data=Good.levels)}   ##Set up the 
###regression function
lmList<-lapply(Good.levels,lmfun)                       ###Apply above
function to all "good levels"

coef.List<-lapply(lmList,coef)          ###List of the coefficients of EACH
above regression

This seems to have worked beautifully.
So, my current problem is extracting the coefficients from "coef.List" (and
I think this is getting to the roof of my general confusion). coef.List
seems to be a list of a list of lm objects, right? How come them I cannot
call them by their position directly (ie. coef.List[1] gives me the first lm
object. I cannot seem to call the first coefficient in this object (the
intercept).) 
My goal is to have a bunch of new lists, each one listing a subset of
coefficients from each regression (ie. one will be a list of all the
intercepts, another the mean of all the "YearFinal" effects, etc).

Thanks again in advance for help with all these noobie questions!
Ben Bolker wrote:

  
    
#
AllenL wrote:
Potentially slightly confusing to have the same name within the lmfun
function ... I probably would
have written this as

lmfun<-function(dat) {lm(ritFinal~BioFinal+YearFinal,data=dat)}  

but the results would be identical.
lmList (not coef.List) should be a list of lm objects, coef.List is a list
of vectors.
I actually think your confusion is simpler/more fundamental.  If you want
the
first element within the first element of coef.List you actually need
coef.List[[1]][1],
not coef.List[1][1] ... 
You should also consider sapply instead of lapply above, that will give you
a matrix of coefficients (which may be easier to deal with)