lmer: No significant coefficients, but significant improvement of model fit?

<r-sig-mixed-models at ...> writes:
 Hey all, This is my first post - but I assume that like at other
lists, brevity is appreciated, so I have a short version and a long
version:
thanks.  I will answer the short version and see how far I get
with the long version.
SHORT VERSION, QUESTIONS ONLY:
1) how is it possible that using lmer, none of the fixed effects has
significant coefficients, yet the model with those parameters fits
significantly better than a model without those parameters? Is this
an example of why lmer didn\'t use to report p-values for the
coefficients?
This is not really an lmer question, but a more general modeling
question.  There are a few things you could mean here, but I don't
think any of them have to do with the "p-value issue", which is
more one of how to deal with the unknown distribution of the test
statistic under the null hypothesis for not-large data sets
{see http://glmm.wikidot.com/faq for more links on the p-value stuff,
and others}

  * you could be asking about the difference between the results
of summary() [which uses Wald tests based on local curvature]
and anova() [which does a more precise test based on model comparison];
anova() is not perfect, but it's more accurate (and hence sometimes
different from) summary
  * you could be asking about multiple predictors, none of which
is individually significant at p<0.05, but their combined effects
(i.e. comparing a model with all predictors vs. none) are significant 
at p<0.05.  This is not really surprising, because the joint effect
of the predictors can be stronger than any one individually.  (Also,
if you're not working with a balanced, nested LMM, the effects of
the predictors can interact.)
2) what do the slash and the colon mean exactly when specifying lmer models?
A colon refers to an interaction, a slash refers to nesting (so
~a/b is equivalent to ~a+a:b, or "b nested within a"): there's more
on this at the wikidot FAQ as well.
LONG VERSION WITH BACKGROUND: I am unexperienced with mixed models,
but I have a dataset that has several levels that needs to be
analysed - and I \'always\' wanted to learn multilevel analysis
anyway, so I decided this was a good occasion.  However, there are
no courses at hand in the near future, so I\'m trying to get there
with online resources and some books (such as \"discovering
statistics using R\" by Andy Field, and in a slightly different
category, the Multilevel Analysis book by Joop and the one by
Snijders & Bosker. However, apparently, I lack what it takes to
autodidactically learn this :-/ So I apologise, but I decided to
draw on your wisdom.  I\'m also kind of hoping that doing multilevel
analyses is a good way of learning how to do them.
I must admit that I don\'t feel like I master the lmer model
formulation, but I found a post by Harold Doran [1] where he
explains the lmer syntax. My data file is structured the same as the
one he models in fm3, fm4 and fm5. I have the following variables
(of interest):
* cannabisUse_bi: a factor with two levels, \"0\" and \"1\". \'0\'
  indicates no cannabis use in the past week; \'1\' indicates cannabis
  use in the past week. This is the dependent variable (i.e. the
  criterion).
* moment: a factor with two levels, \'before\' and \'after\'
* id.factor: a factor with 444 levels, the identification of each
  participants (note that there are quite a lot of missing values,
  only about 276 cases without missings)
* school: a factor with 8 levels, each representing the school that
  the participants attend
* cannabisShow: a factor with 2 levels, \'control\' and
 \'intervention\' - this reflects whether a participant received the
 \'intervention\', aimed to decrease cannabis use, or
 not. Participants in five schools received the intervention;
 participants in three other schools didn\'t.
Every person provided two datapoints (one before the intervention
took place, and one after); there are several persons in a school;
and there are several school in each condition (level) of
cannabisShow.
As far as I understand, this translates to \"Moment is nested within
person (\'id.factor\'), which is nested within school, which is
nested within cannabisShow\" (not sure about that last bit).
Although others on this list disagree, I don't find "nesting" to be
very useful in the context of fixed effects, because the levels of
fixed effects almost always have identical meanings across different
levels of the random effect (i.e., "before" means the same for me as
for you)

 I would say the simplest sensible model would be

glmer(cannabisUse_bi ~ cannabisShow*moment + (1|school/id.factor), 
    family=binomial, data=dat.long)

which if your individuals are uniquely identified should be the same
as using (1|school) + (1|id.factor) as the random effects.

But I agree that you may very well want to try to take into account
whether the effects of the fixed effects differ among schools: you
might _like_ to see whether they differ among individuals as well, but
it is somewhere between impossible and very difficult to extract this
from binary data per individual (I'm sure you can't identify the
effects of cannabisShow, because each individual only gets one
intervention, and I'm pretty sure that you can't identify the effects
of before/after either, because all you have is binary data -- if you
had continuous data you *might* be able to detect variation in slope
among individuals, if it weren't confounded with residual error).

So I would try

glmer(cannabisUse_bi ~ cannabisShow*moment +
   (cannabisShow*moment|school) + (1|id.factor), family=binomial,
   data=dat.long)

(assuming that id.factor is unique across schools)
Now, this model doesn\'t include the effect of the intervention, and
if I include that, I get:
rep_measures.new.model <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|school/id.factor), family=binomial(link =
\"logit\"), data=dat.long);
If I compare these two models using Anova, the second one fits
better (logLik from -182.02 to -166.68, ChiSq = 30.681, Df = 2, p =
2.177e-07). However, when you look at rep_measures.new.model, none
of the fixed effects is significant. I may be completely wrong, but
doesn\'t this mean that the cannabisShow variable, nor its
interaction with measurement moment (i.e. \'time\'), contributes to
explaining the dependent variable (i.e. cannabisUse_bi)?
Maybe the before/after variation among schools (moment|school) is
  doing a lot?  Also, see my comment above about Wald tests.
(in fact, I\'m also a bit confused as to the p-values that lmer
provides for the fixed effects. I thought that there were good
reasons not to - and that lmer wasn\'t supposed to? [3] (I don\'t
understand the post - I\'m sadly not a statistician - but I thought
I got the gist) Apparently this changed . . . ?)
glmer provides likelihood ratio tests, which are good when the
sample size is large.  If you didn't have the school level I would say
not to worry about it, but 8 schools is not a large number ...
 And now that I\'m mailing anyway: what is the difference between
these two models?
rep_measures.new.model.1 <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|school/id.factor), family=binomial(link =
\"logit\"), data=dat.long);
rep_measures.new.model.2 <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|id.factor:school), family=binomial(link =
\"logit\"), data=dat.long);
R gives slightly (but only slightly) different coefficient
estimates; but on the first one, he seems to understand that school
is a level (with 8 values), where for the second one, this is
apparently not specified . . . What\'s the difference between the
slash and the colon for indicating levels (the levels have to be
\'the other way around\', apparently?)?
The second leaves out the school effect, as specified above.
 I\'m sorry to bother the list with such basic questions. I\'ve been
looking for a tutorial or explanation, but I\'ve only been able to
find little bits of information that I pieced together into my
current (lack of ) understanding . . .
Thank you in advance!

Gjalt-Jorn Peters
PS: I\'ve put the R script at
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses.r
(the part I\'m talking about now starts after the line with \"######
Behaviour\", line 195 - the real analyses I\'m talking about now
start at line 314) This .R file downloads the data from
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data.tsv
The output you should get is at
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20output.txt
(but the output file is kind of hard to interpret without the
analyses file, as I didn\'t \"cat\" all comments)
[1] http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html
[2] http://www.rensenieuwenhuis.nl/r-sessions-17-generalized-multilevel-lme4/
    http://www.talkstats.com/showthread.php/
14393-Need-help-with-lmer-model-specification-syntax-for-nested-mixed-model
    http://www.bodowinter.com/tutorial/bw_LME_tutorial.pdf
[3] https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

Dear Ben,

thank you very, very much for your extensive answer!

I have run both models you suggest, and they fit equally well (Chi^2 = 
.99). In addition, the effect of each predictor is the same in both 
models (which makes sense of course).

I will just look at each effects' significance level, as per your 
worries regarding the number of schools (i.e. 8 is not many school).

Before endeavoring more interpretation, though, I will first read up on 
mixed models using the valuable suggestions given earlier :-)

Again, thank you very, very much for your answer, kind regards,

Gjalt-Jorn

*Gjalt-Jorn Peters* | http://behaviorchange.eu

Behavior change research | Health psychology
Intervention development | Applied social psychology 	[ 	GG 
<http://greatergood.eu> 	OU <http://ou.nl> 	UM 
<http://maastrichtuniversity.nl> 	}
  <r-sig-mixed-models at ...> writes:

  Hey all, This is my first post - but I assume that like at other
lists, brevity is appreciated, so I have a short version and a long
version:
   thanks.  I will answer the short version and see how far I get
with the long version.

SHORT VERSION, QUESTIONS ONLY:

1) how is it possible that using lmer, none of the fixed effects has
significant coefficients, yet the model with those parameters fits
significantly better than a model without those parameters? Is this
an example of why lmer didn\'t use to report p-values for the
coefficients?
   This is not really an lmer question, but a more general modeling
question.  There are a few things you could mean here, but I don't
think any of them have to do with the "p-value issue", which is
more one of how to deal with the unknown distribution of the test
statistic under the null hypothesis for not-large data sets
{see http://glmm.wikidot.com/faq for more links on the p-value stuff,
and others}

   * you could be asking about the difference between the results
of summary() [which uses Wald tests based on local curvature]
and anova() [which does a more precise test based on model comparison];
anova() is not perfect, but it's more accurate (and hence sometimes
different from) summary
   * you could be asking about multiple predictors, none of which
is individually significant at p<0.05, but their combined effects
(i.e. comparing a model with all predictors vs. none) are significant
at p<0.05.  This is not really surprising, because the joint effect
of the predictors can be stronger than any one individually.  (Also,
if you're not working with a balanced, nested LMM, the effects of
the predictors can interact.)

2) what do the slash and the colon mean exactly when specifying lmer models?
   A colon refers to an interaction, a slash refers to nesting (so
~a/b is equivalent to ~a+a:b, or "b nested within a"): there's more
on this at the wikidot FAQ as well.

LONG VERSION WITH BACKGROUND: I am unexperienced with mixed models,
but I have a dataset that has several levels that needs to be
analysed - and I \'always\' wanted to learn multilevel analysis
anyway, so I decided this was a good occasion.  However, there are
no courses at hand in the near future, so I\'m trying to get there
with online resources and some books (such as \"discovering
statistics using R\" by Andy Field, and in a slightly different
category, the Multilevel Analysis book by Joop and the one by
Snijders & Bosker. However, apparently, I lack what it takes to
autodidactically learn this :-/ So I apologise, but I decided to
draw on your wisdom.  I\'m also kind of hoping that doing multilevel
analyses is a good way of learning how to do them.

I must admit that I don\'t feel like I master the lmer model
formulation, but I found a post by Harold Doran [1] where he
explains the lmer syntax. My data file is structured the same as the
one he models in fm3, fm4 and fm5. I have the following variables
(of interest):
* cannabisUse_bi: a factor with two levels, \"0\" and \"1\". \'0\'
   indicates no cannabis use in the past week; \'1\' indicates cannabis
   use in the past week. This is the dependent variable (i.e. the
   criterion).
* moment: a factor with two levels, \'before\' and \'after\'
* id.factor: a factor with 444 levels, the identification of each
   participants (note that there are quite a lot of missing values,
   only about 276 cases without missings)
* school: a factor with 8 levels, each representing the school that
   the participants attend
* cannabisShow: a factor with 2 levels, \'control\' and
  \'intervention\' - this reflects whether a participant received the
  \'intervention\', aimed to decrease cannabis use, or
  not. Participants in five schools received the intervention;
  participants in three other schools didn\'t.

Every person provided two datapoints (one before the intervention
took place, and one after); there are several persons in a school;
and there are several school in each condition (level) of
cannabisShow.

As far as I understand, this translates to \"Moment is nested within
person (\'id.factor\'), which is nested within school, which is
nested within cannabisShow\" (not sure about that last bit).
   Although others on this list disagree, I don't find "nesting" to be
very useful in the context of fixed effects, because the levels of
fixed effects almost always have identical meanings across different
levels of the random effect (i.e., "before" means the same for me as
for you)

  I would say the simplest sensible model would be

glmer(cannabisUse_bi ~ cannabisShow*moment + (1|school/id.factor),
     family=binomial, data=dat.long)

which if your individuals are uniquely identified should be the same
as using (1|school) + (1|id.factor) as the random effects.

But I agree that you may very well want to try to take into account
whether the effects of the fixed effects differ among schools: you
might _like_ to see whether they differ among individuals as well, but
it is somewhere between impossible and very difficult to extract this
from binary data per individual (I'm sure you can't identify the
effects of cannabisShow, because each individual only gets one
intervention, and I'm pretty sure that you can't identify the effects
of before/after either, because all you have is binary data -- if you
had continuous data you *might* be able to detect variation in slope
among individuals, if it weren't confounded with residual error).

So I would try

glmer(cannabisUse_bi ~ cannabisShow*moment +
    (cannabisShow*moment|school) + (1|id.factor), family=binomial,
    data=dat.long)

(assuming that id.factor is unique across schools)

Now, this model doesn\'t include the effect of the intervention, and
   if I include that, I get:

rep_measures.new.model <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|school/id.factor), family=binomial(link =
\"logit\"), data=dat.long);

If I compare these two models using Anova, the second one fits
better (logLik from -182.02 to -166.68, ChiSq = 30.681, Df = 2, p =
2.177e-07). However, when you look at rep_measures.new.model, none
of the fixed effects is significant. I may be completely wrong, but
doesn\'t this mean that the cannabisShow variable, nor its
interaction with measurement moment (i.e. \'time\'), contributes to
explaining the dependent variable (i.e. cannabisUse_bi)?
   Maybe the before/after variation among schools (moment|school) is
   doing a lot?  Also, see my comment above about Wald tests.

(in fact, I\'m also a bit confused as to the p-values that lmer
provides for the fixed effects. I thought that there were good
reasons not to - and that lmer wasn\'t supposed to? [3] (I don\'t
understand the post - I\'m sadly not a statistician - but I thought
I got the gist) Apparently this changed . . . ?)
   glmer provides likelihood ratio tests, which are good when the
sample size is large.  If you didn't have the school level I would say
not to worry about it, but 8 schools is not a large number ...

  And now that I\'m mailing anyway: what is the difference between
these two models?

rep_measures.new.model.1 <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|school/id.factor), family=binomial(link =
\"logit\"), data=dat.long);
rep_measures.new.model.2 <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|id.factor:school), family=binomial(link =
\"logit\"), data=dat.long);

R gives slightly (but only slightly) different coefficient
estimates; but on the first one, he seems to understand that school
is a level (with 8 values), where for the second one, this is
apparently not specified . . . What\'s the difference between the
slash and the colon for indicating levels (the levels have to be
\'the other way around\', apparently?)?
   The second leaves out the school effect, as specified above.

  I\'m sorry to bother the list with such basic questions. I\'ve been
looking for a tutorial or explanation, but I\'ve only been able to
find little bits of information that I pieced together into my
current (lack of ) understanding . . .

Thank you in advance!

Gjalt-Jorn Peters

PS: I\'ve put the R script at
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses.r
(the part I\'m talking about now starts after the line with \"######
Behaviour\", line 195 - the real analyses I\'m talking about now
start at line 314) This .R file downloads the data from
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data.tsv

The output you should get is at
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20output.txt
(but the output file is kind of hard to interpret without the
analyses file, as I didn\'t \"cat\" all comments)

[1] http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html
[2] http://www.rensenieuwenhuis.nl/r-sessions-17-generalized-multilevel-lme4/
     http://www.talkstats.com/showthread.php/
     14393-Need-help-with-lmer-model-specification-syntax-for-nested-mixed-model
     http://www.bodowinter.com/tutorial/bw_LME_tutorial.pdf
[3] https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi,

I've uploaded a new version of MCMCglmm to CRAN. Main additions are I)  
more flexible methods for multi-membership and related models II) bug  
fix for the predict function for certain types of random-effect  
marginalisation.  III) sir models reinstated  IV) proposal  
distribution for MH steps returned.

Cheers,

Jarrod

I) The addition of a linking.function that links different random  
effects together. For example imagine two random terms, mother and  
grandmother, for which some mothers also appear as grandmothers.  
Denoting the associated random effect for the mothers (m) and  
grandomothers (g)  we could:

a) fit the simple model ~mother+grandmother which estimates separate  
variances (VAR(m) and VAR(g)) and sets the covariance to zero  
(COV(m,g)=0)

b) use the linking function "str" to fit the model  
~str(mother+grandmother) which estimates separate variances (VAR(m)  
and VAR(g)) but also estimates the covariance (COV(m,g))

c) use the linking function "mm" to fit a multimembership model  
~mm(mother+grandmother) which forces the variances to be equal  
(VAR(m)= VAR(g)) and forces the correlation to be one  
(COV(m,g)=VAR(m)= VAR(g)). Multi-membership models can still be fit  
using idv(mult.memb(~mother+grandmother))

Terms within mm or str can be linked to a ginverse if the ginverse  
list name corresponds to the first term in the linking.function (i.e.  
ginverse=list(mother=A) in the models above). They can also be  
interacted with variance functions (i.e.  
us(sex):str(mother+grandmother) is possible)

II) The predict function did not obtain the correct contribution to  
the variance from the marginalised random effects when us(function)  
defined the marginalised terms and the function was such that a single  
datum was associated with >1 term. For example, in a random regression  
us(1+x) we have the variance for datum i as  
V[1,1]+2*x[i]*V[1,2]+(x[i]^2)*V[2,2]  where V[1,1] is the variance in  
intercept, V[2,2] is the variance in slopes and V[1,2] the covariance  
between intercept and slope.  The term 2*x[i]*V[1,2] was omitted in  
the older versions.  This may effect confidence intervals and fitted  
values on the data scale for non-gaussian data, and prediction  
intervals more generally.

III) sir models have gone back to a dense specification. This means  
that big data sets may run out of memory when setting up the  
equations, but at least it will run if this is not the case.

IV) Tune element in output gives the proposal distribution for the  
latent variables that was used (after the adaptive phase).
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
[Posting to list as others might be interested...]

Jarrod--

Very cool to see the continued development of MCMCglmm.

My typical use of predict() functions (across various R regression-based 
commands) involves generating predictions on newdata -- typically to 
help interpret models involving non-linear terms and/or interactions. 
As far as I can tell, the predict function in v2.17 of MCMCglmm does not 
yet incorporate new data.

Any guess on when the newdata argument in predict.MCMCglmm might "come 
online"?

cheers, Dave
Dave Atkins, PhD

Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
206-616-3879 	
http://depts.washington.edu/cshrb/

"We are drowning in information and starving for knowledge."
Rutherford Roger
Hi Dave,

I did intend to make it part of the current version. The difficulty is  
that if the fixed predictors in newdata have less levels than those in  
data, then things like the intercept will have a different  
interpretation. If data and newdata could be guaranteed to have the  
same levels for fixed terms (and terms within a variance.function)  
then it would be more straightforward. I guess I could return an error  
if this was not the case, and allow predictions on newdata when these  
conditions were satisfied....

Cheers,

Jarrod

Quoting David Atkins <datkins at u.washington.edu> on Thu, 08 Nov 2012  
16:18:36 -0800:
[Posting to list as others might be interested...]

Jarrod--

Very cool to see the continued development of MCMCglmm.

My typical use of predict() functions (across various R  
regression-based commands) involves generating predictions on  
newdata -- typically to help interpret models involving non-linear  
terms and/or interactions. As far as I can tell, the predict  
function in v2.17 of MCMCglmm does not yet incorporate new data.

Any guess on when the newdata argument in predict.MCMCglmm might  
"come online"?

cheers, Dave

-- 
Dave Atkins, PhD

Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
206-616-3879
http://depts.washington.edu/cshrb/

"We are drowning in information and starving for knowledge."
Rutherford Roger

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.