<r-sig-mixed-models at ...> writes:
Hey all, This is my first post - but I assume that like at other
lists, brevity is appreciated, so I have a short version and a long
version:
thanks. I will answer the short version and see how far I get
with the long version.
SHORT VERSION, QUESTIONS ONLY:
1) how is it possible that using lmer, none of the fixed effects has
significant coefficients, yet the model with those parameters fits
significantly better than a model without those parameters? Is this
an example of why lmer didn\'t use to report p-values for the
coefficients?
This is not really an lmer question, but a more general modeling
question. There are a few things you could mean here, but I don't
think any of them have to do with the "p-value issue", which is
more one of how to deal with the unknown distribution of the test
statistic under the null hypothesis for not-large data sets
{see http://glmm.wikidot.com/faq for more links on the p-value stuff,
and others}
* you could be asking about the difference between the results
of summary() [which uses Wald tests based on local curvature]
and anova() [which does a more precise test based on model comparison];
anova() is not perfect, but it's more accurate (and hence sometimes
different from) summary
* you could be asking about multiple predictors, none of which
is individually significant at p<0.05, but their combined effects
(i.e. comparing a model with all predictors vs. none) are significant
at p<0.05. This is not really surprising, because the joint effect
of the predictors can be stronger than any one individually. (Also,
if you're not working with a balanced, nested LMM, the effects of
the predictors can interact.)
2) what do the slash and the colon mean exactly when specifying lmer models?
A colon refers to an interaction, a slash refers to nesting (so
~a/b is equivalent to ~a+a:b, or "b nested within a"): there's more
on this at the wikidot FAQ as well.
LONG VERSION WITH BACKGROUND: I am unexperienced with mixed models,
but I have a dataset that has several levels that needs to be
analysed - and I \'always\' wanted to learn multilevel analysis
anyway, so I decided this was a good occasion. However, there are
no courses at hand in the near future, so I\'m trying to get there
with online resources and some books (such as \"discovering
statistics using R\" by Andy Field, and in a slightly different
category, the Multilevel Analysis book by Joop and the one by
Snijders & Bosker. However, apparently, I lack what it takes to
autodidactically learn this :-/ So I apologise, but I decided to
draw on your wisdom. I\'m also kind of hoping that doing multilevel
analyses is a good way of learning how to do them.
I must admit that I don\'t feel like I master the lmer model
formulation, but I found a post by Harold Doran [1] where he
explains the lmer syntax. My data file is structured the same as the
one he models in fm3, fm4 and fm5. I have the following variables
(of interest):
* cannabisUse_bi: a factor with two levels, \"0\" and \"1\". \'0\'
indicates no cannabis use in the past week; \'1\' indicates cannabis
use in the past week. This is the dependent variable (i.e. the
criterion).
* moment: a factor with two levels, \'before\' and \'after\'
* id.factor: a factor with 444 levels, the identification of each
participants (note that there are quite a lot of missing values,
only about 276 cases without missings)
* school: a factor with 8 levels, each representing the school that
the participants attend
* cannabisShow: a factor with 2 levels, \'control\' and
\'intervention\' - this reflects whether a participant received the
\'intervention\', aimed to decrease cannabis use, or
not. Participants in five schools received the intervention;
participants in three other schools didn\'t.
Every person provided two datapoints (one before the intervention
took place, and one after); there are several persons in a school;
and there are several school in each condition (level) of
cannabisShow.
As far as I understand, this translates to \"Moment is nested within
person (\'id.factor\'), which is nested within school, which is
nested within cannabisShow\" (not sure about that last bit).
Although others on this list disagree, I don't find "nesting" to be
very useful in the context of fixed effects, because the levels of
fixed effects almost always have identical meanings across different
levels of the random effect (i.e., "before" means the same for me as
for you)
I would say the simplest sensible model would be
glmer(cannabisUse_bi ~ cannabisShow*moment + (1|school/id.factor),
family=binomial, data=dat.long)
which if your individuals are uniquely identified should be the same
as using (1|school) + (1|id.factor) as the random effects.
But I agree that you may very well want to try to take into account
whether the effects of the fixed effects differ among schools: you
might _like_ to see whether they differ among individuals as well, but
it is somewhere between impossible and very difficult to extract this
from binary data per individual (I'm sure you can't identify the
effects of cannabisShow, because each individual only gets one
intervention, and I'm pretty sure that you can't identify the effects
of before/after either, because all you have is binary data -- if you
had continuous data you *might* be able to detect variation in slope
among individuals, if it weren't confounded with residual error).
So I would try
glmer(cannabisUse_bi ~ cannabisShow*moment +
(cannabisShow*moment|school) + (1|id.factor), family=binomial,
data=dat.long)
(assuming that id.factor is unique across schools)
Now, this model doesn\'t include the effect of the intervention, and
if I include that, I get:
rep_measures.new.model <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|school/id.factor), family=binomial(link =
\"logit\"), data=dat.long);
If I compare these two models using Anova, the second one fits
better (logLik from -182.02 to -166.68, ChiSq = 30.681, Df = 2, p =
2.177e-07). However, when you look at rep_measures.new.model, none
of the fixed effects is significant. I may be completely wrong, but
doesn\'t this mean that the cannabisShow variable, nor its
interaction with measurement moment (i.e. \'time\'), contributes to
explaining the dependent variable (i.e. cannabisUse_bi)?
Maybe the before/after variation among schools (moment|school) is
doing a lot? Also, see my comment above about Wald tests.
(in fact, I\'m also a bit confused as to the p-values that lmer
provides for the fixed effects. I thought that there were good
reasons not to - and that lmer wasn\'t supposed to? [3] (I don\'t
understand the post - I\'m sadly not a statistician - but I thought
I got the gist) Apparently this changed . . . ?)
glmer provides likelihood ratio tests, which are good when the
sample size is large. If you didn't have the school level I would say
not to worry about it, but 8 schools is not a large number ...
And now that I\'m mailing anyway: what is the difference between
these two models?
rep_measures.new.model.1 <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|school/id.factor), family=binomial(link =
\"logit\"), data=dat.long);
rep_measures.new.model.2 <- lmer(usedCannabis_bi ~ 1 + moment *
cannabisShow + (moment|id.factor:school), family=binomial(link =
\"logit\"), data=dat.long);
R gives slightly (but only slightly) different coefficient
estimates; but on the first one, he seems to understand that school
is a level (with 8 values), where for the second one, this is
apparently not specified . . . What\'s the difference between the
slash and the colon for indicating levels (the levels have to be
\'the other way around\', apparently?)?
The second leaves out the school effect, as specified above.
I\'m sorry to bother the list with such basic questions. I\'ve been
looking for a tutorial or explanation, but I\'ve only been able to
find little bits of information that I pieced together into my
current (lack of ) understanding . . .
Thank you in advance!
Gjalt-Jorn Peters
14393-Need-help-with-lmer-model-specification-syntax-for-nested-mixed-model