Testing for non-linearity and heterogeneity

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20090119/0f5eede5/attachment.pl>
It appears that you want to model nonlinearity with a polynomial
function, that is test whether  the regression of Q on M is better
described by a quadratic than a linear relation. In addition, you test
whether there are reliable differences between units in the linear and
possibly also the quadratic trends.

Such an analysis can be nicely illustrated with the  "sleepstudy" data.
fm1 <- lmer(Reaction ~ poly(Days,2) + (1 | Subject), sleepstudy)
fm2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,1) | Subject), sleepstudy)
fm3 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,2) | Subject), sleepstudy)
Use poly() to specify the degree of the polynomial. fm1 allows only
varying intercepts, fm2 allows varying intercepts and varying linear
slopes, and fm3 allows varying intercepts, and varying linear and
quadratic trends.

You can use anova() to check whether addition of varying linear trends
and varying quadratic trends leads to significant improvement in
goodness of fit:
anova(fm1,fm2,fm3)
Data: sleepstudy
Models:
fm1: Reaction ~ poly(Days, 2) + (1 | Subject)
fm2: Reaction ~ poly(Days, 2) + (poly(Days, 1) | Subject)
fm3: Reaction ~ poly(Days, 2) + (poly(Days, 2) | Subject)
    Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
fm1  5 1802.96 1818.92 -896.48
fm2  7 1764.32 1786.67 -875.16 42.642       2    5.5e-10 ***
fm3 10 1757.68 1789.61 -868.84 12.639      3   0.005487 **

Obviously, it does: AIC and BIC decline, logLik grows significantly.
Then, you inspect the CMs (conditional means for LMM; conditional
modes for GLMM and NLMM) with
dotplot(ranef(fm3, postVar=TRUE)
This will show you that the subject 332 is a bit of an outlier for the
quadratic CMs. Once you remove this subject, there is no significant
reliable between-subject variance for the quadratic trend.
fm1.2 <- lmer(Reaction ~ poly(Days,2) + (1 | Subject), sleepstudy, subset=Subject != 332)
fm2.2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,1) | Subject), sleepstudy, subset=Subject != 332)
fm3.2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,2) | Subject), sleepstudy, subset=Subject != 332)
anova(fm1.2, fm2.2, fm3.2)
Df    AIC    BIC logLik   Chisq Chi Df Pr(>Chisq)
fm1.2  5 1676.6 1692.3 -833.3
fm2.2  7 1618.4 1640.3 -802.2 62.2002      2  3.115e-14 ***
fm3.2 10 1622.2 1653.6 -801.1  2.1971      3     0.5325

Interestingly, the fixed-effect quadratic trend is not significant
for the original data set, but after removal of Subject 332, it is
consistently so. Thus, if there were an independent reason that
justifies exclusion of Subject 332, Reaction appears to follow a
quadratic trend over days but only the mean (intercept) and the linear
trend across days varies reliably between subjects. (This analysis is
only meant as an illustration. I did not check carefully whether this
conclusion holds up under close scrutiny.)

Please note that with the above model specification the number of
random effects grows very quickly. You should not hold lmer
responsible if it fails to converge for your data. Douglas Bates
provided some alternative specifications in an earlier post to keep
the number of random effects limited. Generally, I think that any
question that appears to require a follow-up analysis of CMs can be
rephrased in such a way that it is appropriately represented in the
model from the outset.

Reinhold Kliegl
Dear list members,

I would appreciate some advice on how to model nonlinearity in my dataset.

The dataset contains >1000 observations on variables Q and M. The
observations (actually species) are partitioned into around 50 groups. Each
group spans only a small part of the total range of M. The principle
research focus is on the relationship between Q and M, and whether this
relationship displays heterogenity among groups. This is relatively easy to
explore using:

m1 <- lmer( Q ~ M +  (1|g) )
m2 <- lmer( Q ~ M +  (M|g) )

So far so good. However, I also want to test some recent theories that Q~M
should vary with M, particularly that the relationship between slope and M
might be negative curvilinear.

Until recently, I thought that it would be appropriate to extract the
conditional modes from the model and seek correlations separately. However,
recent posts by Reinhold Kliegl on this list have made it clear that this
would not be appropriate.

So I've been wondered how to model this in lmer. I was thinking of fitting
2nd order polynomials to model the curvilinearity:

m3 <- lmer( Q ~ M + I((M+20)^2) +  (M|g) ) #M is centered on it's mean, so I
need to add 20 to make all values positive
m4 <- lmer( Q ~ M + I((M+20)^2) +  (M + I((M+20)^2)|g) ) )
m5 <- lmer( Q ~ M + I((M+20)^2) + (M|g) + (I((M+20)^2)|g) ) )

I'm not sure how I would interpret differences in the fit of these models,
and I'm sure that other (probably better) solutions exist.

It has also been proposed that variance in the Q~M relationship should
decline with increasing M. Is this a question about heteroscedasticity that
I need to model explicitly (the residuals of m2 are quite well-behaved)?

I would be extremely grateful for any advice on this topic.

Best wishes, Nick

       [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models