Skip to content
Prev 13387 / 20628 Next

Zero cells in contrast matrix problem

Many thanks to both.

The approaches you suggest (and others online) help one deal with the
separation problem but don't offer any specific advice as to how getting a
valid p coefficient when comparing two levels of the model vexed by
separation.

Ben, here's the output of the bglmer which by the way would be ideal since
it allows me to retain the random effect so that all my pairwise
comparisons are conducted using mixed effects.
family = binomial)
Warning message:
package ?blme? was built under R version 3.1.2
Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf, posterior.scale =
cov, common.scale = TRUE)
Prior dev  : 1.4371

Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['bglmerMod']
 Family: binomial  ( logit )
Formula: Correct ~ Syntax.Semantics + (1 | Part.name)
   Data: trialglm

     AIC      BIC   logLik deviance df.resid
   269.9    305.5   -126.0    251.9      376

Scaled residuals:
    Min      1Q  Median      3Q     Max
-0.9828 -0.4281 -0.2445 -0.0002  5.7872

Random effects:
 Groups    Name        Variance Std.Dev.
 Part.name (Intercept) 0.3836   0.6194
Number of obs: 385, groups:  Part.name, 16

Fixed effects:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -1.8671     0.4538  -4.114 3.89e-05 ***
Syntax.Semantics A    0.8121     0.5397   1.505   0.1324
Syntax.Semantics B  -16.4391  1195.5031  -0.014   0.9890
Syntax.Semantics C   -1.1323     0.7462  -1.517   0.1292
Syntax.Semantics D    0.1789     0.5853   0.306   0.7598
Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819
Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 .
Syntax.Semantics G    0.4395     0.5417   0.811   0.4171
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Unfortunately the separation problem is still there. Should I be
constraining the parameter somehow? How would I do that? The data is below.

In passing I also tried brglm which solves the separation problem but tells
me comparison is significant which I don't believe one bit (see the data
below). I am pretty sure about this because when I reveled and look at the
comparisons I was able to compute using glmer, these turn out to be
non-significant, when glmer told me they were:
binomial)
Warning messages:
1: package ?elrm? was built under R version 3.1.2
2: package ?coda? was built under R version 3.1.3
Call:
brglm(formula = Correct ~ Syntax.Semantics, family = binomial,
    data = trialglm)


Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 -1.6358     0.4035  -4.053 5.05e-05 ***
Syntax.Semantics A   0.6689     0.5169   1.294   0.1957
Syntax.Semantics B  -3.0182     1.4902  -2.025   0.0428 *
Syntax.Semantics C  -1.0135     0.6889  -1.471   0.1413
Syntax.Semantics D   0.1515     0.5571   0.272   0.7857
Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561
Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 .
Syntax.Semantics G   0.4358     0.5186   0.840   0.4007
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 262.51  on 384  degrees of freedom
Residual deviance: 256.22  on 377  degrees of freedom
Penalized deviance: 245.5554
AIC:  272.22


MCMCglmm is too complex for me.

Wolfgang, I tried the penalized likelihood method (logistf function)
but output is hard to read:
binomial)
Warning messages:
1: package ?logistf? was built under R version 3.1.2
2: package ?mice? was built under R version 3.1.2
logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
    family = binomial)

Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood Profile Likelihood
Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
Profile Likelihood Profile Likelihood

                                 coef  se(coef) lower 0.95 upper 0.95
Chisq            p
(Intercept)                 3.2094017 0.7724482  2.9648747  3.5127830
 0.000000 1.000000e+00
Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987 64.224452
1.110223e-15
Syntax.Semantics B -1.0583602 0.8959376 -1.3963977 -0.7625216  0.000000
1.000000e+00
Syntax.Semantics C -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000
1.000000e+00
Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908  1.156512
2.821901e-01
Syntax.Semantics E -0.6476907 0.9771824 -1.0076740 -0.3164066  0.000000
1.000000e+00
Syntax.Semantics F -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000
1.000000e+00
Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981  0.000000
1.000000e+00

Likelihood ratio test=121.9841 on 7 df, p=0, n=385
Wald test = 5.334321 on 7 df, p = 0.6192356

In particular, what is this model telling me? That Z (my ref level) and B
are significantly different?

I'm happy to try the elrm function with exact logistic regression but I am
not capable of programming it. Besides, would it give me valid estimates
for the comparison between the Z and B levels? The data frame should look
like this:

Outcome variable (Correct, incorrect)
Predictor variable (A, B, C, D, E, F, G, Z)
Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D: 36,8; C:45,3; A: 34,13;
F:65,22).

Thank you!
 F.
On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com> wrote: