An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130611/392c70b9/attachment.pl>
Heteroscedasticity, lme4 and nlme
2 messages · Jennifer Bufford, Ben Bolker
Jennifer Bufford <jbufford at ...> writes:
[snip]
I am working on my dissertation and using mixed models for quite a bit of my analysis. Here are some details about my experiment and data:
I grew seedlings of 21 species in two sites, 10 blocks per site, and 2
plots per block either with or without competition ("Weeds"). I am also
including phylogenetic family (3 families) and invasiveness (3
categories). I have site and family as fixed effects because I understand
there aren't enough categories to reliably estimate them as random
effects. I have several potential continuous covariates as well. My model
includes all possible two-way interactions, fixed and random, and I am
reducing the model using the likelihood ratio test. My goal in reducing
the model is to improve clarity and avoid over-fitting. I have both
continuous (transformed) and binomial response variables.
Here's a full (unreduced) model without covariates:
RGR.simple <- lmer(RGR ~ Inv + Weeds + Site + Fam + Inv:Weeds + Inv:Site +
Weeds:Site + Weeds:Fam + Site:Fam + (1|Blk) + (1|Plot) + (1|Sp) + (0 +
Site|Sp) + (0+Weeds|Sp), fdatRGR, family="gaussian")
You don't need family="gaussian" here, and you might be able to write the fixed-effect part of your formula more compactly as RGR ~ (Inv + Weeds + Site + Fam)^2 + ... I assume that the plots are uniquely labeled.
My data are extremely unbalanced. I have heteroscedasticity in the residuals of my full and reduced models, as seen both by plotting and using leveneTest (package car).
That's unfortunate. I know your data are already growth relative to size, but does log-transforming your data help?
I have several questions, which center largely around the issue of heteroscedasticity: I have both crossed random effects and serious heteroscedasticity. I'd like to implement a non-homogeneous variance structure, but as far as I can tell, that can only be done in lme (from nlme), not in lmer (from lme4). Is that right? I have heard that there are some work-arounds to get crossed random effects in lme.
Yes, see pp 163ff of Pinheiro and Bates 2000 (this is referenced in http://glmm.wikidot.com/faq)
Would that be the best solution in this situation? If I specified an alternate variance structure by species, should residuals by species be homogenous (ie non-significant Levene's Test) or not?
I don't think they would need to be
I've also gotten a bit confused about random interactions. lme4 allows much more complex random effects - would that in some way help account for heteroscedasticity? If so, it hasn't worked for my data, but I want to understand what the random effects are doing. In other words, what is the difference between specifying a variance structure (e.g. varIDent(form = ~1|Species) and including vector-valued random effects (e.g. (Competition|Species))? My model statistics (ie LRT and AIC) tell me that many of the models are better with these complex random effects, but that also restricts my movement from lme4 to nlme.
varIdent(form=~1|Species) says that the residual variance differs among species. (Competition|Species) says that the effect of competition varies among species.
There is a weights function in lmer, but I know it is not the same as the weights function in lme. If I use the weights term in lmer to specify weighting by the scaled reciprocal of species variance, that would give species with smaller variances a greater "say" in the model - is that right? Weighting this way does fix my heteroscedasticity problems, but I'm not sure if it is a valid approach.
I'm not sure if this works: also, there's likely to be some correlation between the different parts of the variance structure (e.g. between estimates of differences in residual variances among species and estimates of the random-effects variances) Everything you've said so far seems reasonable, but I'm still a bit worried that you're working with too complex a model. Can you think of plausible ways to simplify that you would be comfortable with? Another option, if you really want to do complex variance models like this, is to build your own in WinBUGS/AD Model Builder -- then you really know exactly what's going on. If you can, it would be worth simulating some data that are as complex as the models you're trying to fit, and see if you can get reasonable answers ...