An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20090119/0f5eede5/attachment.pl>
Testing for non-linearity and heterogeneity
2 messages · Nick Isaac, Reinhold Kliegl
It appears that you want to model nonlinearity with a polynomial function, that is test whether the regression of Q on M is better described by a quadratic than a linear relation. In addition, you test whether there are reliable differences between units in the linear and possibly also the quadratic trends. Such an analysis can be nicely illustrated with the "sleepstudy" data.
fm1 <- lmer(Reaction ~ poly(Days,2) + (1 | Subject), sleepstudy) fm2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,1) | Subject), sleepstudy) fm3 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,2) | Subject), sleepstudy)
Use poly() to specify the degree of the polynomial. fm1 allows only varying intercepts, fm2 allows varying intercepts and varying linear slopes, and fm3 allows varying intercepts, and varying linear and quadratic trends. You can use anova() to check whether addition of varying linear trends and varying quadratic trends leads to significant improvement in goodness of fit:
anova(fm1,fm2,fm3)
Data: sleepstudy
Models:
fm1: Reaction ~ poly(Days, 2) + (1 | Subject)
fm2: Reaction ~ poly(Days, 2) + (poly(Days, 1) | Subject)
fm3: Reaction ~ poly(Days, 2) + (poly(Days, 2) | Subject)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
fm1 5 1802.96 1818.92 -896.48
fm2 7 1764.32 1786.67 -875.16 42.642 2 5.5e-10 ***
fm3 10 1757.68 1789.61 -868.84 12.639 3 0.005487 **
Obviously, it does: AIC and BIC decline, logLik grows significantly.
Then, you inspect the CMs (conditional means for LMM; conditional
modes for GLMM and NLMM) with
dotplot(ranef(fm3, postVar=TRUE)
This will show you that the subject 332 is a bit of an outlier for the quadratic CMs. Once you remove this subject, there is no significant reliable between-subject variance for the quadratic trend.
fm1.2 <- lmer(Reaction ~ poly(Days,2) + (1 | Subject), sleepstudy, subset=Subject != 332) fm2.2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,1) | Subject), sleepstudy, subset=Subject != 332) fm3.2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,2) | Subject), sleepstudy, subset=Subject != 332)
anova(fm1.2, fm2.2, fm3.2)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) fm1.2 5 1676.6 1692.3 -833.3 fm2.2 7 1618.4 1640.3 -802.2 62.2002 2 3.115e-14 *** fm3.2 10 1622.2 1653.6 -801.1 2.1971 3 0.5325 Interestingly, the fixed-effect quadratic trend is not significant for the original data set, but after removal of Subject 332, it is consistently so. Thus, if there were an independent reason that justifies exclusion of Subject 332, Reaction appears to follow a quadratic trend over days but only the mean (intercept) and the linear trend across days varies reliably between subjects. (This analysis is only meant as an illustration. I did not check carefully whether this conclusion holds up under close scrutiny.) Please note that with the above model specification the number of random effects grows very quickly. You should not hold lmer responsible if it fails to converge for your data. Douglas Bates provided some alternative specifications in an earlier post to keep the number of random effects limited. Generally, I think that any question that appears to require a follow-up analysis of CMs can be rephrased in such a way that it is appropriately represented in the model from the outset. Reinhold Kliegl
On Mon, Jan 19, 2009 at 1:36 PM, Nick Isaac <njbisaac at googlemail.com> wrote:
Dear list members,
I would appreciate some advice on how to model nonlinearity in my dataset.
The dataset contains >1000 observations on variables Q and M. The
observations (actually species) are partitioned into around 50 groups. Each
group spans only a small part of the total range of M. The principle
research focus is on the relationship between Q and M, and whether this
relationship displays heterogenity among groups. This is relatively easy to
explore using:
m1 <- lmer( Q ~ M + (1|g) )
m2 <- lmer( Q ~ M + (M|g) )
So far so good. However, I also want to test some recent theories that Q~M
should vary with M, particularly that the relationship between slope and M
might be negative curvilinear.
Until recently, I thought that it would be appropriate to extract the
conditional modes from the model and seek correlations separately. However,
recent posts by Reinhold Kliegl on this list have made it clear that this
would not be appropriate.
So I've been wondered how to model this in lmer. I was thinking of fitting
2nd order polynomials to model the curvilinearity:
m3 <- lmer( Q ~ M + I((M+20)^2) + (M|g) ) #M is centered on it's mean, so I
need to add 20 to make all values positive
m4 <- lmer( Q ~ M + I((M+20)^2) + (M + I((M+20)^2)|g) ) )
m5 <- lmer( Q ~ M + I((M+20)^2) + (M|g) + (I((M+20)^2)|g) ) )
I'm not sure how I would interpret differences in the fit of these models,
and I'm sure that other (probably better) solutions exist.
It has also been proposed that variance in the Q~M relationship should
decline with increasing M. Is this a question about heteroscedasticity that
I need to model explicitly (the residuals of m2 are quite well-behaved)?
I would be extremely grateful for any advice on this topic.
Best wishes, Nick
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models