specifying/interpreting random effects with near-zero variance in glmer()
On Fri, Jan 13, 2012 at 6:49 PM, Ben Bolker <bbolker at gmail.com> wrote:
Margaret Metz <mrmetz at ...> writes: [snip]
I am using glmer() and a logit link for the survival model,
including fixed factors of 3 topographic models ("topo1", "topo2",
and "topo3" for simplicity), starting height ("ht") I have 130+
species ("sp") found at 200 census stations ("station"). ?Not all
species are found at all stations, and the sample size per species
ranges from 10 - 1200 individuals (and I could restrict these
further to ones with a sample size greater than some threshold).
?The topo variables are continuous, right? ?You probably don't need to -- this is one of the strengths of the mixed modeling approach.
?I would like to know whether the topographic variables are significant predictors of mortality while including the random factors of census station to account for non-independence of seedlings at the same location (which have the same topo measurements) and species to allow for variation in species' responses. ?I expect that both the slope and intercept of species' responses to each variable could be quite different. ?To allow for different slopes/intercepts among species, I have centered the continuous variables and specified the model as:
glmer(survival ~ topo1 + topo2 + topo3 + ht + (0 + topo1 | sp) + (0 + topo2 | sp) + (0 + topo3 | sp) + (1 | sp) + (1 | station), data=seedlingdata, family=binomial)
?This looks reasonable, you might want to check for overdispersion.
Questions: When I do this, there is a random intercept for station, a random intercept for species, and then random slopes among species for the relationship with the topographic variables as follows in the model output. ?I believe this is allowing for the variation among species that I intend, but would like confirmation of this specification vs. something like (topo1 | sp) or (1 + topo1 | sp) as someone else has suggested to me.
(topo1 | sp) is equivalent to (1 | topo1 | sp) (as (0 + topo1 | sp) is equivalent to (topo1 - 1 | sp)
To forestall future confusion, I think you meant that (topo1 | sp) is equivalent to (1 + topo1 | sp)
?If you have enough data you could try (topo1 + topo2 + topo3 | sp ) which allows for correlation among the effects of the topographic variables -- although you can run out of data pretty quickly in some cases, and it sounds from stuff below as though you're running low on signal anyway. ?(This model has (n+1)*(n+2)/2 = 10 parameters -- 4 variances (topo[1-3] plus intercept) and 6 covariances -- as opposed to the 4 variances of the model you are using.) (I'm not counting the station variable in these totals.)
Any version of these models that I have run results in significant fixed factors and zero or near-zero variances for the random effects. ?I interpret this to mean that the topographic variables are important predictors of seedling mortality, but that the relationship does not vary among species groups nor census locations. ?Is this your interpretation too or need I worry about model specification or the sample size or variance structure of my variables?
? This is a reasonable interpretation. ?However, be aware that this is signal-to-noise / sample-size dependent. ?There could be (is, by definition, in an ecological system) some among-species and among-station variance that you just can't detect with this data set. (In a classical model with a balanced, nested, etc. design you would probably just find a small (non-significant) variance in this case, rather than a practically-zero one -- on the other hand, there are other classical models where you would actually estimate a *negative* variance.)
?A suggestion was made to confirm a lack of spatial autocorrelation in the residuals of this model, but I am not sure that is appropriate given the inclusion of the random effect of census station and the fixed effects of topography, which are shared by seedlings at the same station. ?Can anyone suggest an appropriate reference to support or refute this suggestion?
?I don't have a reference but I would suggest that checking for spatial autocorrelation might be worthwhile. Spatial autocorrelation would detect the effects of _unmeasured_ covariates that were more similar among nearby stations.
Finally, if the response to topography DID significantly vary among species, where in this model would I see it? ?In a large variance for the species slopes or intercept?
?Exactly (variance among species in responses to topo1, topo2, topo3) Or would I need to include
species as a fixed factor crossed with the topographic variables?
?(topo1 | sp) is effectively crossing topo with species. ?I would consider looking (at least graphically) for evidence of nonlinearity in the responses to the continuous variables ... you could fit a GAM without *too* much extra effort, and with this size dataset it might produce interesting results.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models