Significance and lmer - R-SIG-mixed-models

Fri, Mar 26, 2010 4:51 PM #

Dear colleagues,

Please consider this series of commands:

a <- lmer(log(stddiff+.1539) ~ pred + m*v + option + (option|studyID),
data=r1, subset=option>1, REML=FALSE)

b <- update(a, . ~ . - pred)

anova(a,b)

...am I mistaken in thinking that the latter command will produce a test of
whether "pred" is a significant predictor of log(stddiff+.1539)? I am
concerned because of the results:

Estimate   Std. Error    t value
(Intercept) -0.6608993664 0.1591862808 -4.1517357
pred         0.0879255592 0.1715599954  0.5125062
ml           0.0656916428 0.1173308419  0.5598838
vl          -0.0980204413 0.1276648229 -0.7677952
option       0.0003197903 0.0008134259  0.3931400
ml:vl       -0.1890574941 0.1710443092 -1.1053130

...note a t-value of 0.51 for this item...very small! ...but anova(a,b) produces this:

Models:
b: log(stddiff + 0.1539) ~ m + v + option + (option | studyID) +
b:     m:v
a: log(stddiff + 0.1539) ~ pred + m * v + option + (option | studyID)
   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
b  9 3969.2 4019.1 -1975.6
a 10 3955.9 4011.2 -1967.9 15.345      1  8.954e-05 ***
---

...a significant result completely unrelated to the t-value. My
interpretation of this would be that we have no good evidence that the
estimate for 'pred' is nonzero, but including pred in the model improves
prediction.

I think I must be missing something here--I would appreciate anyone's input
on what that "something" is.

Cordially,
--
Adam D. I. Kramer
Ph.D. Candidate, Social Psychology
University of Oregon
adik-rhelp at ilovebacon.org

Ben Bolker

Sat, Mar 27, 2010 8:04 AM #

Adam D. I. Kramer <adik at ...> writes:

[snip]

It is possible for Wald tests (as provided by summary()) to 
disagree radically with likelihood ratio tests (look up "Hauck-Donner
effects", but my guess is that's not what's going
on here (it definitely can apply in binomial models, don't think
it should apply to LMMs but ?).

  I have seen some wonky stuff happen with update() [sorry, can't
provide any reproducible details], I would definitely try fitting
b by spelling out the full model rather than using update() and
see if that makes a difference.

  Other than that, nothing springs to mind.

  (Where does the log(x+0.1539) transformation come from???)

Adam D. I. Kramer

Sat, Mar 27, 2010 10:09 AM #

On Sat, 27 Mar 2010, Ben Bolker wrote:

There are no Wald tests produced by the summary()...my understanding from
reading this list is that the t-values are provided because they are t-like
(effect / se), but that it is difficult (and perhaps foolish) to estimate
degrees of freedom for t. So my concern is based on the fact that t is very
small.

This produces no difference in b's estimates or the anova() statistics.
(That said, I originally was fitting [implicitly] with REML=TRUE, which did
make a difference, but not a big one).

Well, thanks for the reply. Are you, then, of the opinion that the above
interpretation is reasonable?

x is power-law distributed with a bunch of zeroes (but not ordinal, or I'd
use family=poisson), and .1539 is the 25th percentile. This normalizes is
pretty well. Good question, though! And thanks ofr the response!

--Adam

David Duffy

Sat, Mar 27, 2010 3:04 PM #

On Sat, 27 Mar 2010, Adam D. I. Kramer wrote:

The two models both have the same number of observations, one hopes?  How 
many observations per studyID and how many studyIDs?

I would be a bit nervous.  My interpretation would be that the model is 
inappropriate for the data (as the Wald and LR tests should roughly agree 
for a LMM, as Ben pointed out), and would look at diagnostic plots of 
residuals etc.  The bunch of zeroes you mention may still be stuffing 
things up ;)  Is a left-censored model plausible?

Just my 2c, David Duffy.

| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

Adam D. I. Kramer

Sat, Mar 27, 2010 4:17 PM #

The problem turned out to be, indeed, differing numbers of observations.
This is likely due to me relying too much on update() to work as I
expected...it did not drop the observations previously dropped. The help
page for update makes it very clear that it just re-evaluates an altered
call, so this is my fault. Ben's comment about update() being wonky should
have given me a hint.

Preselecting cases using complete.cases() for both models brought the t
values and chi-square values much closer together--when t=.51 for the
coefficient, the chisq of a likelihood test for removing the variable from
the model was chisq=.25, leading to a reasonable p=.62.

Thanks very much to you and Ben Bolker!

--Adam

On Sun, 28 Mar 2010, David Duffy wrote: