how reliable are inferences drawn from binomial models for small datasets fitted with lme4?

Hello,

but for m01 I think the model output:

####
print (m01 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +  
(1  | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (Treatment - 1 | F1) + (1 | F2)
    Data: dat
    AIC   BIC logLik deviance
  87.58 107.1 -37.79    75.58
Random effects:
  Groups Name        Variance   Std.Dev.   Corr
  F2     (Intercept) 1.6467e-12 1.2832e-06
  F1     Treatment1  9.0240e+00 3.0040e+00
         Treatment2  3.4832e+00 1.8663e+00 -1.000
Number of obs: 190, groups: F2, 24; F1, 16
###

indicates that there is not enough information to estimate that model (Corr=-1.00) and that the random effects part should be simplified notwithstanding the logLik value? Sorry fpr adding a question and hope this is relevant.

Cheers,

Luca

----- Messaggio originale -----
Da: "Roger Levy" <rlevy at ling.ucsd.edu>
A: "Jarrod Hadfield" <j.hadfield at ed.ac.uk>
Cc: r-sig-mixed-models at r-project.org
Inviato: Domenica, 5 luglio 2009 20:06:56 GMT -05:00 U.S.A./Canada, stati orientali
Oggetto: Re: [R-sig-ME] how reliable are inferences drawn from binomial models for small datasets fitted with lme4?

Dear Jarrod,

Many thanks for your thoughts on this!  (more below)

Dear Roger,

I think part of your problem is that in treatment 2 there are 60  
success but only a single failure.
Actually in the original data there are 61 successes and no failures.   
In the imagined version of the data with 60 successes and one failure,  
the lme4 and Bayesian results are more in agreement.
This means that there will be little (no?) information in the data  
to estimate F1 and F2 variances that are specific to treatment 2.
I suspect this explains the NaN, and also the discrepancy between  
JAGS and lmer: in the Bayesian analysis this information is coming  
completely from your prior, as are the between treatment  
covariances.  I would recommend the simpler model:

m1 <- lmer(Response ~ Treatment+(1 | F1)+(1 | F2), dat,  
family=binomial)

The F2 variance is fixed at zero, and the treatment effect is  
significant at p=0.0268

For this model the lmer and MCMCglmm results agree for the fixed  
effects. The variance components are sensitive to the prior however,  
but with weak priors the posterior distributions of the variances  
are very wide.
Ah, but the trouble is that there *does* seem to be a reasonable  
amount of evidence for treatment-specific random effects of at least  
one of F1 and F2.  For example:

 > print (m1 <- lmer(Response ~ Treatment + (1 | F1) + (1  | F2), dat,  
family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (1 | F1) + (1 | F2)
    Data: dat
    AIC   BIC logLik deviance
  90.15 103.1 -41.08    82.15
Random effects:
  Groups Name        Variance Std.Dev.
  F2     (Intercept) 0.0000   0.0000
  F1     (Intercept) 3.4413   1.8551
Number of obs: 190, groups: F2, 24; F1, 16

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   2.9542     0.6277   4.706 2.52e-06 ***
Treatment2    2.6760     1.2083   2.215   0.0268 *
 > print (m01 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +  
(1  | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (Treatment - 1 | F1) + (1 | F2)
    Data: dat
    AIC   BIC logLik deviance
  87.58 107.1 -37.79    75.58
Random effects:
  Groups Name        Variance   Std.Dev.   Corr
  F2     (Intercept) 1.6467e-12 1.2832e-06
  F1     Treatment1  9.0240e+00 3.0040e+00
         Treatment2  3.4832e+00 1.8663e+00 -1.000
Number of obs: 190, groups: F2, 24; F1, 16

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   3.8284     0.9857   3.884 0.000103 ***
Treatment2    1.3655     2.0034   0.682 0.495487

Going by the reported log-likelihoods and applying the reasoning of  
Baayen, Bates, and Davidson, a likelihood ratio test should be  
conservative and there so there is quite a bit of evidence for m01  
over m00 above.

This leads back to the original issue, though.  One intuition that I  
have is that for the two models

   Response ~ Treatment + (Treatment - 1 | F1) + (Treatment - 1 | F2)
   Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 | F2)

the log-likelihood should really have to be a fair bit better in the  
former than in the latter model (more than a difference of 0.42!),  
because whereas both models have to explain the observed proportions,  
the latter model also has to be lucky enough for the random effects to  
line up such that the observed proportions have a non-trivial  
likelihood.  That is, with the log-likelihood of interest being

   f(y | \beta, \theta) = \int P(y | \beta, b) P(b | \sigma) db

the former model should have much more probability from P(b | \sigma)  
in the region of b where P(y | \beta, b) is large enough to matter.

Best

Roger
Quoting Roger Levy <rlevy at ling.ucsd.edu>:

This post may be of interest in light of the recent discussion of PQL
versus Laplace-approximated likelihood.  I'm facing an interestingly
challenging analysis of a relatively small (190-observation)
binary-response dataset with a single two-level treatment and two
crossed random factors (call them F1 and F2).  The question of  
current
interest is whether I can infer a difference in fixed effect of
treatment; it is hard to figure out the right conclusions to draw.   
I'm
putting the full dataset at the end of the message, but I'll start by
presenting some summary statistics:

xtabs(~ Treatment + Response, dat)
        Response
Treatment   0   1
       1  12 117
       2   0  61

Fisher's exact test says "yes, the treatment has an effect" at  
p=0.01.
But of course that doesn't take into account the crossed hierarchical
structure of the data.  Some statistics on this:

with(dat,tapply(Response, list(F1,Treatment),length))
   1  2
1  11  5
2   7  6
3   9  3
4   4 NA
5   3  3
6  11 NA
7   3  3
8   9  4
9  11  3
10 12  6
11 11  8
12  5  6
13  1 NA
14 11  1
15 12  9
16  9  4
with(dat,tapply(Response, list(F1,Treatment),mean))
          1  2
1  1.0000000  1
2  0.5714286  1
3  0.8888889  1
4  1.0000000 NA
5  0.0000000  1
6  1.0000000 NA
7  0.6666667  1
8  1.0000000  1
9  1.0000000  1
10 1.0000000  1
11 0.8181818  1
12 0.6000000  1
13 1.0000000 NA
14 1.0000000  1
15 1.0000000  1
16 1.0000000  1
with(dat,tapply(Response, list(F2,Treatment),length))
  1  2
1  4  2
2  8  2
3  3  1
4  5  2
5  6  3
6  7  2
7  5  5
8  8  1
9  6  6
10 5  2
11 6 NA
12 6  6
13 4  1
14 6  2
15 4  6
16 5  2
17 5  2
18 4  1
19 5 NA
20 4 NA
21 5  6
22 6  3
23 6  5
24 6  1
with(dat,tapply(Response, list(F2,Treatment),mean))
          1  2
1  1.0000000  1
2  0.8750000  1
3  1.0000000  1
4  1.0000000  1
5  1.0000000  1
6  1.0000000  1
7  0.8000000  1
8  0.8750000  1
9  0.8333333  1
10 1.0000000  1
11 0.8333333 NA
12 0.6666667  1
13 1.0000000  1
14 0.8333333  1
15 1.0000000  1
16 1.0000000  1
17 1.0000000  1
18 1.0000000  1
19 1.0000000 NA
20 1.0000000 NA
21 1.0000000  1
22 0.6666667  1
23 0.8333333  1
24 0.8333333  1

So there is some sign of inter-cluster variability, though of course
the binary response makes it hard to see.  Crossed random-slope  
models
converge both with and without a fixed effect of treatment...

print(m1 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +
(Treatment - 1 | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (Treatment - 1 | F1) + (Treatment -  
1 |
    F2)
  Data: dat
  AIC   BIC logLik deviance
82.65 108.6 -33.33    66.65
Random effects:
Groups Name       Variance   Std.Dev.   Corr
F2     Treatment1 4.1354e-12 2.0336e-06
       Treatment2 1.0492e+00 1.0243e+00 0.000
F1     Treatment1 9.4589e+00 3.0755e+00
       Treatment2 6.9945e-01 8.3633e-01 0.000
Number of obs: 190, groups: F2, 24; F1, 16

Fixed effects:
           Estimate Std. Error z value Pr(>|z|)
(Intercept)    3.943      1.004   3.929 8.54e-05 ***
Treatment2    17.289   5221.110   0.003    0.997
print(m0 <- lmer(Response ~ 1 + (Treatment - 1 | F1) + (Treatment  
- 1
| F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 | F2)
  Data: dat
 AIC   BIC logLik deviance
81.5 104.2 -33.75     67.5
Random effects:
Groups Name       Variance   Std.Dev.   Corr
F2     Treatment1 0.0000e+00 0.00000000
       Treatment2 1.7654e-08 0.00013287   NaN
F1     Treatment1 2.3858e+01 4.88443481
       Treatment2 3.0185e-02 0.17373716 -1.000
Number of obs: 190, groups: F2, 24; F1, 16

Fixed effects:
           Estimate Std. Error z value Pr(>|z|)
(Intercept)    5.855      1.384   4.232 2.32e-05 ***

...but I am concerned (a) about the NaN correlation for F2 (this must
be a sign of something broken somewhere!?), and (b) about the minimal
difference in log-likelihood for m1 versus m0.  (Note that it's
possible to fix (a) by dropping random effects of F2 from m0, and the
log-likelihood doesn't change.)  It seems hard to believe that the
marginal log-likelihoods of the models given estimates of the fixed
effects and of the random-effect covariance matrices differs by only
0.42.  I draw further evidence for this conclusion by imagining that
one response in Treatment 2 were a failure rather than a success:

dat[142,"Response"] <- 0
logLik(lmer(Response ~ Treatment + (Treatment - 1 | F1) + (Treatment
- 1 | F2), dat, family=binomial))
'log Lik.' -35.29494 (df=8)
logLik(lmer(Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 |
F2), dat, family=binomial))
'log Lik.' -37.60635 (df=7)

Now there's a considerable difference in log-likelihood; this would
turn out significant in a chi-squared test (though of course we'd  
want
to calibrate with simulation if this were the actual dataset).

Note that I can't use the Wald statistic here to address my  
hypothesis
for the real dataset, because the lack of failure responses in
Treatment 2 blows up the MLE and also the standard error of its
parameter.  I was hoping to use a likelihood-ratio comparison  
(perhaps
calibrated with simulation), but it seems to me that the minimal
observed difference in likelihood ratio for the actual dataset throws
that idea out the window.

So here are some questions:

1) Is there something specific to the Laplace approximation being  
used
that renders logit models with extreme response proportions generally
unreliable?  (I can't check this directly against other packages
because of the crossed random effects.)

2) Is this dataset simply small enough that the uncertainty in
estimating of the random-effect variance pararameters makes
point-estimate inference on fixed effects unreliable, and I should go
the Bayesian route and integrate over the uncertainty?  Following  
this
possibility, I implemented the fixed-effect-of-treatment model (m1)  
in
JAGS with diffuse normal priors over the intercept and Treatment 2
parameter, and diffuse Wishart priors over the random-effect  
covariance
matrices.  Here are 95% Bayesian confidence intervals that result:

                     lower        upper
Omega.F2[1,1]  5.316465e+02 5.922489e+05
Omega.F2[2,1] -3.016791e+05 2.899492e+05
Omega.F2[1,2] -3.016791e+05 2.899492e+05
Omega.F2[2,2]  2.116783e+02 5.911579e+05
Omega.F1[1,1]  5.186791e+01 5.979535e+05
Omega.F1[2,1] -3.073216e+05 2.898743e+05
Omega.F1[1,2] -3.073216e+05 2.898743e+05
Omega.F1[2,2]  2.764357e+02 5.972545e+05
intercept      1.749505e+00 2.938812e+00
Treatment2     1.240127e+00 6.279801e+02

and here is the posterior mode:

Omega.F2[1,1]  70299.472805
Omega.F2[2,1]  -1324.272662
Omega.F2[1,2]  -1324.272662
Omega.F2[2,2]  70810.193188
Omega.F1[1,1]  70426.234086
Omega.F1[2,1] -46977.070058
Omega.F1[1,2] -46977.070058
Omega.F1[2,2]  87987.091315
intercept      1.977808
Treatment2     5.846867

Some notable differences between these Bayesian results and the  
lme4-fit m1:

* lme4 inferred considerably larger random effects of F1 than of F2,
which doesn't seem supported by the marginal means I listed near the
beginning of the email; whereas the Bayesian inference puts F1 and F2
on roughly equal footing;

* the severe lack of evidence regarding the correlations of random
effects across treatment is clearly evident in the Bayesian  
confidence
intervals;

* the fixed-effects posterior modes are much closer to zero than the
Laplace-approximated MLEs;

* The Bayesian inference seems strongly supportive of a fixed  
effect of
treatment, whereas lme4 superficially seemed to lead to the opposite
conclusions.

I'd love to hear people's thoughts on my approach to these data.   
Many
thanks in advance.

Roger

**

   Treatment F1 F2 Response
1           1  6  1        1
2           1  7  1        1
3           1  8  1        1
4           1 15  1        1
5           1  1  2        1
6           1  2  2        1
7           1  3  2        1
8           1  4  2        1
9           1  9  2        1
10          1 10  2        1
11          1 11  2        1
12          1 12  2        0
13          1  6  3        1
14          1 14  3        1
15          1 15  3        1
16          1  3  4        1
17          1  4  4        1
18          1  9  4        1
19          1 10  4        1
20          1 11  4        1
21          1  6  5        1
22          1  8  5        1
23          1 13  5        1
24          1 14  5        1
25          1 15  5        1
26          1 16  5        1
27          1  1  6        1
28          1  2  6        1
29          1  3  6        1
30          1  9  6        1
31          1 10  6        1
32          1 11  6        1
33          1 12  6        1
34          1  5  7        0
35          1  6  7        1
36          1 14  7        1
37          1 15  7        1
38          1 16  7        1
39          1  1  8        1
40          1  2  8        1
41          1  3  8        1
42          1  4  8        1
43          1  9  8        1
44          1 10  8        1
45          1 11  8        0
46          1 12  8        1
47          1  5  9        0
48          1  7  9        1
49          1  8  9        1
50          1 14  9        1
51          1 15  9        1
52          1 16  9        1
53          1  1 10        1
54          1  3 10        1
55          1  9 10        1
56          1 10 10        1
57          1 11 10        1
58          1  5 11        0
59          1  6 11        1
60          1  8 11        1
61          1 14 11        1
62          1 15 11        1
63          1 16 11        1
64          1  1 12        1
65          1  2 12        1
66          1  9 12        1
67          1 10 12        1
68          1 11 12        0
69          1 12 12        0
70          1  6 13        1
71          1 14 13        1
72          1 15 13        1
73          1 16 13        1
74          1  1 14        1
75          1  2 14        0
76          1  3 14        1
77          1 10 14        1
78          1 11 14        1
79          1 12 14        1
80          1  6 15        1
81          1  8 15        1
82          1 14 15        1
83          1 15 15        1
84          1  1 16        1
85          1  3 16        1
86          1  4 16        1
87          1  9 16        1
88          1 10 16        1
89          1  6 17        1
90          1  8 17        1
91          1 14 17        1
92          1 15 17        1
93          1 16 17        1
94          1  1 18        1
95          1  9 18        1
96          1 10 18        1
97          1 11 18        1
98          1  6 19        1
99          1  8 19        1
100         1 14 19        1
101         1 15 19        1
102         1 16 19        1
103         1  1 20        1
104         1  9 20        1
105         1 10 20        1
106         1 11 20        1
107         1  6 21        1
108         1  8 21        1
109         1 14 21        1
110         1 15 21        1
111         1 16 21        1
112         1  1 22        1
113         1  2 22        0
114         1  3 22        0
115         1  9 22        1
116         1 10 22        1
117         1 11 22        1
118         1  6 23        1
119         1  7 23        0
120         1  8 23        1
121         1 14 23        1
122         1 15 23        1
123         1 16 23        1
124         1  1 24        1
125         1  2 24        0
126         1  3 24        1
127         1  9 24        1
128         1 10 24        1
129         1 11 24        1
130         2  3  1        1
131         2 11  1        1
132         2 15  2        1
133         2 16  2        1
134         2  2  3        1
135         2  7  4        1
136         2 15  4        1
137         2 10  5        1
138         2 11  5        1
139         2 12  5        1
140         2  8  6        1
141         2 15  6        1
142         2  1  7        0
143         2  2  7        1
144         2 10  7        1
145         2 11  7        1
146         2 12  7        1
147         2  8  8        1
148         2  1  9        1
149         2  2  9        1
150         2  9  9        1
151         2 10  9        1
152         2 11  9        1
153         2 12  9        1
154         2  5 10        1
155         2  8 10        1
156         2  5 12        1
157         2  7 12        1
158         2  8 12        1
159         2 14 12        1
160         2 15 12        1
161         2 16 12        1
162         2  1 13        1
163         2  5 14        1
164         2 15 14        1
165         2  2 15        1
166         2  3 15        1
167         2  9 15        1
168         2 10 15        1
169         2 11 15        1
170         2 12 15        1
171         2 15 16        1
172         2 16 16        1
173         2 11 17        1
174         2 12 17        1
175         2 15 18        1
176         2  1 21        1
177         2  2 21        1
178         2  3 21        1
179         2  9 21        1
180         2 10 21        1
181         2 11 21        1
182         2  7 22        1
183         2 15 22        1
184         2 16 22        1
185         2  1 23        1
186         2  2 23        1
187         2 10 23        1
188         2 11 23        1
189         2 12 23        1
190         2 15 24        1

--

Roger Levy                      Email: rlevy at ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

--

Roger Levy                      Email: rlevy at ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Hello,

but for m01 I think the model output:

####
print (m01 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +
(1  | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (Treatment - 1 | F1) + (1 | F2)
   Data: dat
   AIC   BIC logLik deviance
 87.58 107.1 -37.79    75.58
Random effects:
 Groups Name        Variance   Std.Dev.   Corr
 F2     (Intercept) 1.6467e-12 1.2832e-06
 F1     Treatment1  9.0240e+00 3.0040e+00
        Treatment2  3.4832e+00 1.8663e+00 -1.000
Number of obs: 190, groups: F2, 24; F1, 16
###

indicates that there is not enough information to estimate that  
model (Corr=-1.00) and that the random effects part should be  
simplified notwithstanding the logLik value? Sorry fpr adding a  
question and hope this is relevant.
Dear Luca,

Many thanks for your thoughts on this -- I did notice the correlation  
for F1 but in my thinking, this was sensible behavior by the model and  
didn't necessarily indicate a need to simplify the F1 random effects  
component.  Let me explain my reasoning.  Going back to the marginal  
mean responses per F1/Treatment combination:

 > with(dat,tapply(Response, list(F1, Treatment), mean))
            1  2
1  1.0000000  1
2  0.5714286  1
3  0.8888889  1
4  1.0000000 NA
5  0.0000000  1
6  1.0000000 NA
7  0.6666667  1
8  1.0000000  1
9  1.0000000  1
10 1.0000000  1
11 0.8181818  1
12 0.6000000  1
13 1.0000000 NA
14 1.0000000  1
15 1.0000000  1
16 1.0000000  1

it seems to me that the only way a random effect can contribute to the  
likelihood is by bringing down the linear predictor in Treatment for  
levels 2, 3, 7, 11, and 12 of F1.  So the random effects for these  
levels in Treatment 1 must be negative.  Now, given this constraint,  
what choice random effects for these levels in Treatment 2 maximize  
the likelihood?  Since there are never any Treatment 2 failures, these  
random effects should be as high as possible.  This leads to an  
inferred correlation of -1 for the random effects of F1.

The surprise to me isn't really that the random-effect correlation is  
so extreme, but rather that the model log-likelihoods seem to indicate  
that there's sufficient evidence to go with the model with more  
complex random-effects structure.  Which leads me to ask about how  
reliable log-likelihood differences for mixed-effects logit models on  
small datasets like this may be.

Many thanks once more!

Roger

--

Roger Levy                      Email: rlevy at ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy