lmer formula syntax?
On Tue, 2011-02-01 at 14:14 -0800, Colin Wahl wrote:
Dear r-sig-mixed-models List, I've been digging through the r-help archive and various books in search of an answer to my query. I have been thus far unsuccessful. I am not new to R, but have primarily focused on multivariate non parametric analyses in the past; my experience with lmer has been confined to the last few weeks. The model formulas I have written work and I have some idea of which are accurate/best based on AIC scores, but I would very much appreciate an informed opinion on my syntax. I am performing an ecological study of stream health using %EPT (An aggregate relative abundance variable for invertebrates) as my response variable. I am interested in how invertebrate counts vary between different watershed types (forested, cultivated, developed, grassland) and reach types (forested, non forested). There are a total of 12 streams, 12 watersheds and 24 reaches. The wshed factor is unbalanced.
Hi Colin, I have very little to add to the discussion thus far (from which I have learned a lot by the way!), except to question the use of family = gaussian for this response? %EPT will be bounded at zero (there will be samples for locations that are so polluted as to not have any of these critters living there), and potentially bounded at 100% (theoretically at least, although I find it hard to see how one would get 100% of this group of inverts in a kick sample). The errors may depart from the assumptions of the LMM. If you know the total counts then perhaps a binomial GLMM would be appropriate. If not, beta regression via the betareg package might be an alternative - the sandwich package can be used to fix up the covariance matrix to account for the clustering in the data. Just my tuppence, G
Data structure:
str(ept)
'data.frame': 72 obs. of 4 variables: $ wshed : Factor w/ 4 levels "c","d","f","g": 1 1 1 1 1 1 1 1 1 1 ... $ stream : Factor w/ 12 levels "BA","D1","D2",..: 4 4 4 6 6 6 10 10 10 4 ... $ riparia: Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 2 ... $ ept : num 0.318 0.156 0.194 0.146 0.158 ... stream is the only random variable and it is nested (not implicitly) in wshed. My standard statistical model is as follows: ept(ijkl) = + wshedi + stream(i)j + ripariak + wshed*ripariaik + stream*riparia(i)jk + (ijk)l The best option I have come up with thus far is:
model<-lmer(ept ~ wshed + riparia + wshed:riparia + (1| wshed/stream) + (1|
stream:riparia), data=ept, family="gaussian") or alternatively:
model<-lmer(ept ~ wshed*riparia + (1| wshed/stream) + (1| stream:riparia),
data=ept, family="gaussian") Question: Am I specifying the random terms and nesting structure correctly? Secondarily: I have read a few times that "wshed/stream" is interchangeable with "wshed:stream" which is a meaningless interaction. Also I've seen that random effects are specified as (a|b) where a is a covariate and b is a grouping factor. Does having 1 as a covariate simply specifying an intercept of 1? What is the purpose of placing a factor in the place of 1 as a covariate? OR is there a nice complete summary tutorial that I've missed. I'm looking forward to hearing any comments. Thank you,
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%