lmer formula syntax?
On Tue, Feb 1, 2011 at 4:14 PM, Colin Wahl <biowahl at gmail.com> wrote:
Dear r-sig-mixed-models List, I've been digging through the r-help archive and various books in search of an answer to my query. I have been thus far unsuccessful. I am not new to R, but have primarily focused on multivariate non parametric analyses in the past; my experience with lmer has been confined to the last few weeks. The model formulas I have written work and I have some idea of which are accurate/best based on AIC scores, but I would very much appreciate an informed opinion on my syntax. I am performing an ecological study of stream health using %EPT (An aggregate relative abundance variable for invertebrates) as my response variable. I am interested in how invertebrate counts vary between different watershed types (forested, cultivated, developed, grassland) and reach types (forested, non forested). There are a total of 12 streams, 12 watersheds and 24 reaches. The wshed factor is unbalanced. Data structure:
str(ept)
'data.frame': ? ?72 obs. of ?4 variables: ?$ wshed ?: Factor w/ 4 levels "c","d","f","g": 1 1 1 1 1 1 1 1 1 1 ... ?$ stream : Factor w/ 12 levels "BA","D1","D2",..: 4 4 4 6 6 6 10 10 10 4 ... ?$ riparia: Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 2 ... ?$ ept ? ?: num ?0.318 0.156 0.194 0.146 0.158 ...
stream is the only random variable and it is nested (not implicitly) in wshed. My standard statistical model is as follows: ept(ijkl) = ? + wshedi ?+ stream(i)j + ripariak + wshed*ripariaik + stream*riparia(i)jk + ?(ijk)l
The best option I have come up with thus far is:
model<-lmer(ept ~ wshed + riparia + wshed:riparia + (1| wshed/stream) + (1|
stream:riparia), data=ept, family="gaussian") or alternatively:
model<-lmer(ept ~ wshed*riparia + (1| wshed/stream) + (1| stream:riparia),
data=ept, family="gaussian")
Thanks to Toby Marthews and George Wang for their answers. I'm sorry to say that there is no good comprehensive tutorial on lme4 yet - still working on it but right now I am more immersed in the code. In your case (1|stream) and (1|wshed:stream) are the same (because your stream factor is not implicitly nester). However (1|wshed/stream) is different from (1|wshed:stream). (1|wshed/stream) expands to (1|wshed) + (1|wshed:stream) or, in your case, (1|wshed) + (1|stream). You don't want that because you already have a fixed effect term for wshed. The best way to read a term like (1|wshed/stream) is "wshed and stream within wshed". I tend to avoid that notation because I prefer to be explicit about the terms. So we would get to a formula like eps ~ wshed*riparia + (1|stream) before considering the stream:riparia interaction. When you have two factors, one with fixed levels like riparia and one with random levels like stream you can write their interaction in two ways: a vector-valued random effects term like (riparia|stream) where there are two, possibly correlated, random effects for each stream; and as two nested scalar random effects, (1|stream) + (1|riparia:stream). For either to make sense, you must have several streams with both "F" and "N" levels of riparia. I generally prefer the second form because it models only the combinations that occur in the datal, although it looks like you have balanced data (3 obs on each stream under each riparia condition) so that wouldn't be an issue for you. The first formulation involves estimating three variance components (two variances and one covariance) whereas the second involves estimating only two variances. The way that you wrote the model in "subscript fest" notation would correspond to the second form. Finally, it is best not to include family="gaussian" in a call to lmer. Technically, lmer only fits linear mixed models in which all distributions are Gaussian. At one point I added code that would detect a family argument and quietly call glmer instead. Then I showed examples of this, which was a bad idea because it is now common practice to call lmer in place of glmer. I believe the code checks for the case of family="gaussian" (with the default link) and goes ahead with the call to lmer but if I get that wrong then you end up calling glmer, which is slower, instead of lmer. I hope this helps.
Question: Am I specifying the random terms and nesting structure correctly? Secondarily: I have read a few times that "wshed/stream" is interchangeable with ?"wshed:stream" which is a meaningless interaction. Also I've seen that random effects are specified as (a|b) where a is a covariate and b is a grouping factor. Does having 1 as a covariate simply specifying an intercept of 1? What is the purpose of placing a factor in the place of 1 as a covariate? OR is there a nice complete summary tutorial that I've missed. I'm looking forward to hearing any comments. Thank you, -- Colin Wahl Department of Biology Western Washington University Bellingham, WA 98225 ph: 360-391-9881 ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models