Zero cells in contrast matrix problem

After giving up on a glmer for my data, I remembered a post by Roger Levy
suggesting to try the use non mixed effects glm when one of the cells in a
matrix is zero.

To put this into perspective:
trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name), data =
trialglm, family = binomial)

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.053657 (tol = 0.001,
component 4)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

My data has a binary outcome, correct or incorrect, a fixed effect
predictor factor with 8 levels, and a random effect for participants. I
believe the problem R is encountering is with one level of the factor (let
us call it level B) which has no counts (no I won' t try to post the table
from the paper with the counts because I know it will get garbled up!).

I attempt a glm with the same data:
trial<-glm(Correct ~ Syntax.Semantics, data = trialglm, family = binomial)
anova(trial)
Analysis of Deviance Table

Model: binomial, link: logit

Response: Correct

Terms added sequentially (first to last)

                 Df Deviance Resid. Df Resid. Dev
NULL                               384     289.63
Syntax.Semantics  7   34.651       377     254.97
summary(trial)
Call:
glm(formula = Correct ~ Syntax.Semantics, family = binomial,
    data = trialglm)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-0.79480  -0.62569  -0.34474  -0.00013   2.52113

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 -1.6917     0.4113  -4.113 3.91e-05 ***
Syntax.Semantics A   0.7013     0.5241   1.338   0.1809
Syntax.Semantics B -16.8744   904.5273  -0.019   0.9851
Syntax.Semantics C  -1.1015     0.7231  -1.523   0.1277
Syntax.Semantics D   0.1602     0.5667   0.283   0.7774
Syntax.Semantics E  -0.8733     0.7267  -1.202   0.2295
Syntax.Semantics F  -1.4438     0.8312  -1.737   0.0824 .
Syntax.Semantics G   0.4630     0.5262   0.880   0.3789
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 289.63  on 384  degrees of freedom
Residual deviance: 254.98  on 377  degrees of freedom
AIC: 270.98

Number of Fisher Scoring iterations: 17

 The comparison I'm interested in is between level B and the reference
level but it cannot be estimated as shown by the ridiculously high estimate
and SE value.

Any suggestions on how to get a decent beta, SE, z, and p? It's the only
comparison missing in the table for the levels I need so I think it would
be a bit unacademic of me to close this deal saying 'the difference could
not be estimated due to zero count'.

And by the way I have seen this comparison being generated using other
stats.

Thanks in advance,

Frank
You may need to consider using an 'exact', Bayesian, or penalized likelihood approach (along the lines proposed by Firth).

Maybe a place to start: http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html

Best,
Wolfgang
-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
project.org] On Behalf Of Francesco Romano
Sent: Wednesday, May 27, 2015 23:00
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Zero cells in contrast matrix problem

After giving up on a glmer for my data, I remembered a post by Roger Levy
suggesting to try the use non mixed effects glm when one of the cells in
a
matrix is zero.

To put this into perspective:

trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name), data =
trialglm, family = binomial)

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
:
  Model failed to converge with max|grad| = 0.053657 (tol = 0.001,
component 4)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
:
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

My data has a binary outcome, correct or incorrect, a fixed effect
predictor factor with 8 levels, and a random effect for participants. I
believe the problem R is encountering is with one level of the factor
(let
us call it level B) which has no counts (no I won' t try to post the
table
from the paper with the counts because I know it will get garbled up!).

I attempt a glm with the same data:

trial<-glm(Correct ~ Syntax.Semantics, data = trialglm, family =
binomial)
anova(trial)
Analysis of Deviance Table

Model: binomial, link: logit

Response: Correct

Terms added sequentially (first to last)

                 Df Deviance Resid. Df Resid. Dev
NULL                               384     289.63
Syntax.Semantics  7   34.651       377     254.97
summary(trial)
Call:
glm(formula = Correct ~ Syntax.Semantics, family = binomial,
    data = trialglm)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-0.79480  -0.62569  -0.34474  -0.00013   2.52113

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 -1.6917     0.4113  -4.113 3.91e-05 ***
Syntax.Semantics A   0.7013     0.5241   1.338   0.1809
Syntax.Semantics B -16.8744   904.5273  -0.019   0.9851
Syntax.Semantics C  -1.1015     0.7231  -1.523   0.1277
Syntax.Semantics D   0.1602     0.5667   0.283   0.7774
Syntax.Semantics E  -0.8733     0.7267  -1.202   0.2295
Syntax.Semantics F  -1.4438     0.8312  -1.737   0.0824 .
Syntax.Semantics G   0.4630     0.5262   0.880   0.3789
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 289.63  on 384  degrees of freedom
Residual deviance: 254.98  on 377  degrees of freedom
AIC: 270.98

Number of Fisher Scoring iterations: 17

 The comparison I'm interested in is between level B and the reference
level but it cannot be estimated as shown by the ridiculously high
estimate
and SE value.

Any suggestions on how to get a decent beta, SE, z, and p? It's the only
comparison missing in the table for the levels I need so I think it would
be a bit unacademic of me to close this deal saying 'the difference could
not be estimated due to zero count'.

And by the way I have seen this comparison being generated using other
stats.

Thanks in advance,

Frank

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
And for what it's worth, you can do this in conjunction with lme4 by
using the blme package instead (a thin Bayesian wrapper around lme4),
or via the MCMCglmm package; see
http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html for
an example (search for "complete separation").

On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
You may need to consider using an 'exact', Bayesian, or penalized likelihood approach (along the lines proposed by Firth).

Maybe a place to start: http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html

Best,
Wolfgang

-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
project.org] On Behalf Of Francesco Romano
Sent: Wednesday, May 27, 2015 23:00
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Zero cells in contrast matrix problem

After giving up on a glmer for my data, I remembered a post by Roger Levy
suggesting to try the use non mixed effects glm when one of the cells in
a
matrix is zero.

To put this into perspective:

trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name), data =
trialglm, family = binomial)

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
:
  Model failed to converge with max|grad| = 0.053657 (tol = 0.001,
component 4)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
:
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

My data has a binary outcome, correct or incorrect, a fixed effect
predictor factor with 8 levels, and a random effect for participants. I
believe the problem R is encountering is with one level of the factor
(let
us call it level B) which has no counts (no I won' t try to post the
table
from the paper with the counts because I know it will get garbled up!).

I attempt a glm with the same data:

trial<-glm(Correct ~ Syntax.Semantics, data = trialglm, family =
binomial)
anova(trial)
Analysis of Deviance Table

Model: binomial, link: logit

Response: Correct

Terms added sequentially (first to last)

                 Df Deviance Resid. Df Resid. Dev
NULL                               384     289.63
Syntax.Semantics  7   34.651       377     254.97
summary(trial)
Call:
glm(formula = Correct ~ Syntax.Semantics, family = binomial,
    data = trialglm)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-0.79480  -0.62569  -0.34474  -0.00013   2.52113

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 -1.6917     0.4113  -4.113 3.91e-05 ***
Syntax.Semantics A   0.7013     0.5241   1.338   0.1809
Syntax.Semantics B -16.8744   904.5273  -0.019   0.9851
Syntax.Semantics C  -1.1015     0.7231  -1.523   0.1277
Syntax.Semantics D   0.1602     0.5667   0.283   0.7774
Syntax.Semantics E  -0.8733     0.7267  -1.202   0.2295
Syntax.Semantics F  -1.4438     0.8312  -1.737   0.0824 .
Syntax.Semantics G   0.4630     0.5262   0.880   0.3789
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 289.63  on 384  degrees of freedom
Residual deviance: 254.98  on 377  degrees of freedom
AIC: 270.98

Number of Fisher Scoring iterations: 17

 The comparison I'm interested in is between level B and the reference
level but it cannot be estimated as shown by the ridiculously high
estimate
and SE value.

Any suggestions on how to get a decent beta, SE, z, and p? It's the only
comparison missing in the table for the levels I need so I think it would
be a bit unacademic of me to close this deal saying 'the difference could
not be estimated due to zero count'.

And by the way I have seen this comparison being generated using other
stats.

Thanks in advance,

Frank

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Many thanks to both.

The approaches you suggest (and others online) help one deal with the
separation problem but don't offer any specific advice as to how getting a
valid p coefficient when comparing two levels of the model vexed by
separation.

Ben, here's the output of the bglmer which by the way would be ideal since
it allows me to retain the random effect so that all my pairwise
comparisons are conducted using mixed effects.
trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data = trialglm,
family = binomial)
Warning message:
package ?blme? was built under R version 3.1.2
summary(trial)
Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf, posterior.scale =
cov, common.scale = TRUE)
Prior dev  : 1.4371

Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['bglmerMod']
 Family: binomial  ( logit )
Formula: Correct ~ Syntax.Semantics + (1 | Part.name)
   Data: trialglm

     AIC      BIC   logLik deviance df.resid
   269.9    305.5   -126.0    251.9      376

Scaled residuals:
    Min      1Q  Median      3Q     Max
-0.9828 -0.4281 -0.2445 -0.0002  5.7872

Random effects:
 Groups    Name        Variance Std.Dev.
 Part.name (Intercept) 0.3836   0.6194
Number of obs: 385, groups:  Part.name, 16

Fixed effects:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -1.8671     0.4538  -4.114 3.89e-05 ***
Syntax.Semantics A    0.8121     0.5397   1.505   0.1324
Syntax.Semantics B  -16.4391  1195.5031  -0.014   0.9890
Syntax.Semantics C   -1.1323     0.7462  -1.517   0.1292
Syntax.Semantics D    0.1789     0.5853   0.306   0.7598
Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819
Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 .
Syntax.Semantics G    0.4395     0.5417   0.811   0.4171
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Unfortunately the separation problem is still there. Should I be
constraining the parameter somehow? How would I do that? The data is below.

In passing I also tried brglm which solves the separation problem but tells
me comparison is significant which I don't believe one bit (see the data
below). I am pretty sure about this because when I reveled and look at the
comparisons I was able to compute using glmer, these turn out to be
non-significant, when glmer told me they were:
trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family =
binomial)
Warning messages:
1: package ?elrm? was built under R version 3.1.2
2: package ?coda? was built under R version 3.1.3
summary(trial)
Call:
brglm(formula = Correct ~ Syntax.Semantics, family = binomial,
    data = trialglm)

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 -1.6358     0.4035  -4.053 5.05e-05 ***
Syntax.Semantics A   0.6689     0.5169   1.294   0.1957
Syntax.Semantics B  -3.0182     1.4902  -2.025   0.0428 *
Syntax.Semantics C  -1.0135     0.6889  -1.471   0.1413
Syntax.Semantics D   0.1515     0.5571   0.272   0.7857
Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561
Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 .
Syntax.Semantics G   0.4358     0.5186   0.840   0.4007
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 262.51  on 384  degrees of freedom
Residual deviance: 256.22  on 377  degrees of freedom
Penalized deviance: 245.5554
AIC:  272.22

MCMCglmm is too complex for me.

Wolfgang, I tried the penalized likelihood method (logistf function)
but output is hard to read:
trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm, family =
binomial)
Warning messages:
1: package ?logistf? was built under R version 3.1.2
2: package ?mice? was built under R version 3.1.2
summary(trial)
logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
    family = binomial)

Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood Profile Likelihood
Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
Profile Likelihood Profile Likelihood

                                 coef  se(coef) lower 0.95 upper 0.95
Chisq            p
(Intercept)                 3.2094017 0.7724482  2.9648747  3.5127830
 0.000000 1.000000e+00
Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987 64.224452
1.110223e-15
Syntax.Semantics B -1.0583602 0.8959376 -1.3963977 -0.7625216  0.000000
1.000000e+00
Syntax.Semantics C -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000
1.000000e+00
Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908  1.156512
2.821901e-01
Syntax.Semantics E -0.6476907 0.9771824 -1.0076740 -0.3164066  0.000000
1.000000e+00
Syntax.Semantics F -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000
1.000000e+00
Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981  0.000000
1.000000e+00

Likelihood ratio test=121.9841 on 7 df, p=0, n=385
Wald test = 5.334321 on 7 df, p = 0.6192356

In particular, what is this model telling me? That Z (my ref level) and B
are significantly different?

I'm happy to try the elrm function with exact logistic regression but I am
not capable of programming it. Besides, would it give me valid estimates
for the comparison between the Z and B levels? The data frame should look
like this:

Outcome variable (Correct, incorrect)
Predictor variable (A, B, C, D, E, F, G, Z)
Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D: 36,8; C:45,3; A: 34,13;
F:65,22).

Thank you!
 F.

And for what it's worth, you can do this in conjunction with lme4 by
using the blme package instead (a thin Bayesian wrapper around lme4),
or via the MCMCglmm package; see
http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html for
an example (search for "complete separation").

On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
You may need to consider using an 'exact', Bayesian, or penalized
likelihood approach (along the lines proposed by Firth).
Maybe a place to start:
http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
Best,
Wolfgang

-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
project.org] On Behalf Of Francesco Romano
Sent: Wednesday, May 27, 2015 23:00
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Zero cells in contrast matrix problem

After giving up on a glmer for my data, I remembered a post by Roger
Levy
suggesting to try the use non mixed effects glm when one of the cells in
a
matrix is zero.

To put this into perspective:

trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name), data =
trialglm, family = binomial)

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
:
  Model failed to converge with max|grad| = 0.053657 (tol = 0.001,
component 4)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
:
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

My data has a binary outcome, correct or incorrect, a fixed effect
predictor factor with 8 levels, and a random effect for participants. I
believe the problem R is encountering is with one level of the factor
(let
us call it level B) which has no counts (no I won' t try to post the
table
from the paper with the counts because I know it will get garbled up!).

I attempt a glm with the same data:

trial<-glm(Correct ~ Syntax.Semantics, data = trialglm, family =
binomial)
anova(trial)
Analysis of Deviance Table

Model: binomial, link: logit

Response: Correct

Terms added sequentially (first to last)

                 Df Deviance Resid. Df Resid. Dev
NULL                               384     289.63
Syntax.Semantics  7   34.651       377     254.97
summary(trial)
Call:
glm(formula = Correct ~ Syntax.Semantics, family = binomial,
    data = trialglm)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-0.79480  -0.62569  -0.34474  -0.00013   2.52113

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 -1.6917     0.4113  -4.113 3.91e-05 ***
Syntax.Semantics A   0.7013     0.5241   1.338   0.1809
Syntax.Semantics B -16.8744   904.5273  -0.019   0.9851
Syntax.Semantics C  -1.1015     0.7231  -1.523   0.1277
Syntax.Semantics D   0.1602     0.5667   0.283   0.7774
Syntax.Semantics E  -0.8733     0.7267  -1.202   0.2295
Syntax.Semantics F  -1.4438     0.8312  -1.737   0.0824 .
Syntax.Semantics G   0.4630     0.5262   0.880   0.3789
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 289.63  on 384  degrees of freedom
Residual deviance: 254.98  on 377  degrees of freedom
AIC: 270.98

Number of Fisher Scoring iterations: 17

 The comparison I'm interested in is between level B and the reference
level but it cannot be estimated as shown by the ridiculously high
estimate
and SE value.

Any suggestions on how to get a decent beta, SE, z, and p? It's the only
comparison missing in the table for the levels I need so I think it
would
be a bit unacademic of me to close this deal saying 'the difference
could
not be estimated due to zero count'.

And by the way I have seen this comparison being generated using other
stats.

Thanks in advance,

Frank

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Many thanks to both.

The approaches you suggest (and others online) help one deal with
the separation problem but don't offer any specific advice as to
how getting a valid p coefficient when comparing two levels of the
model vexed by separation.

Ben, here's the output of the bglmer which by the way would be
ideal since it allows me to retain the random effect so that all my
pairwise comparisons are conducted using mixed effects.

trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
trialglm,
family = binomial) Warning message: package ?blme? was built under
R version 3.1.2
summary(trial)
Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf,
posterior.scale = cov, common.scale = TRUE) Prior dev  : 1.4371

Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation) ['bglmerMod'] Family: binomial  ( logit ) Formula:
Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm

AIC      BIC   logLik deviance df.resid 269.9    305.5   -126.0
251.9      376

Scaled residuals: Min      1Q  Median      3Q     Max -0.9828
-0.4281 -0.2445 -0.0002  5.7872

Random effects: Groups    Name        Variance Std.Dev. Part.name
(Intercept) 0.3836   0.6194 Number of obs: 385, groups:  Part.name,
16

Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
-1.8671     0.4538  -4.114 3.89e-05 *** Syntax.Semantics A
0.8121     0.5397   1.505   0.1324 Syntax.Semantics B  -16.4391
1195.5031  -0.014   0.9890 Syntax.Semantics C   -1.1323     0.7462
-1.517   0.1292 Syntax.Semantics D    0.1789     0.5853   0.306
0.7598 Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819 
Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 . 
Syntax.Semantics G    0.4395     0.5417   0.811   0.4171 --- 
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Unfortunately the separation problem is still there. Should I be 
constraining the parameter somehow? How would I do that? The data
is below.
Did you read the section in the URL I suggested?  Just using bglmer
isn't enough; you also have to set a prior on the fixed effects.

  Your data don't seem to be attached (note that the mailing list
strips most non-ASCII file types).
In passing I also tried brglm which solves the separation problem
but tells me comparison is significant which I don't believe one
bit (see the data below). I am pretty sure about this because when
I reveled and look at the comparisons I was able to compute using
glmer, these turn out to be non-significant, when glmer told me
they were:

trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
=
binomial) Warning messages: 1: package ?elrm? was built under R
version 3.1.2 2: package ?coda? was built under R version 3.1.3
summary(trial)
Call: brglm(formula = Correct ~ Syntax.Semantics, family =
binomial, data = trialglm)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
-1.6358     0.4035  -4.053 5.05e-05 *** Syntax.Semantics A   0.6689
0.5169   1.294   0.1957 Syntax.Semantics B  -3.0182     1.4902
-2.025   0.0428 * Syntax.Semantics C  -1.0135     0.6889  -1.471
0.1413 Syntax.Semantics D   0.1515     0.5571   0.272   0.7857 
Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561 
Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 . 
Syntax.Semantics G   0.4358     0.5186   0.840   0.4007 --- Signif.
codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 262.51  on 384  degrees of freedom Residual
deviance: 256.22  on 377  degrees of freedom Penalized deviance:
245.5554 AIC:  272.22

MCMCglmm is too complex for me.

Wolfgang, I tried the penalized likelihood method (logistf
function) but output is hard to read:

trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
family =
binomial) Warning messages: 1: package ?logistf? was built under R
version 3.1.2 2: package ?mice? was built under R version 3.1.2
summary(trial)
logistf(formula = Correct ~ Syntax.Semantics, data = trialglm, 
family = binomial)

Model fitted by Penalized ML Confidence intervals and p-values by
Profile Likelihood Profile Likelihood Profile Likelihood Profile
Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
Profile Likelihood

coef  se(coef) lower 0.95 upper 0.95 Chisq            p (Intercept)
3.2094017 0.7724482  2.9648747  3.5127830 0.000000 1.000000e+00 
Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987
64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
-1.3963977 -0.7625216  0.000000 1.000000e+00 Syntax.Semantics C
-0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000 1.000000e+00 
Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908
1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
-1.0076740 -0.3164066  0.000000 1.000000e+00 Syntax.Semantics F
-0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000 1.000000e+00 
Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981
0.000000 1.000000e+00

Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
5.334321 on 7 df, p = 0.6192356

In particular, what is this model telling me? That Z (my ref level)
and B are significantly different?

I'm happy to try the elrm function with exact logistic regression
but I am not capable of programming it. Besides, would it give me
valid estimates for the comparison between the Z and B levels? The
data frame should look like this:

Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
36,8; C:45,3; A: 34,13; F:65,22).

Thank you! F.

On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
wrote:

And for what it's worth, you can do this in conjunction with lme4
by using the blme package instead (a thin Bayesian wrapper around
lme4), or via the MCMCglmm package; see 
http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
for an example (search for "complete separation").

On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT) 
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
You may need to consider using an 'exact', Bayesian, or
penalized
likelihood approach (along the lines proposed by Firth).
Maybe a place to start:
http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html

Best,
Wolfgang

-----Original Message----- From: R-sig-mixed-models
[mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
cells in contrast matrix problem

After giving up on a glmer for my data, I remembered a post
by Roger
Levy
suggesting to try the use non mixed effects glm when one of
the cells in a matrix is zero.

To put this into perspective:

trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
data =
trialglm, family = binomial)

Warning messages: 1: In checkConv(attr(opt, "derivs"),
opt$par, ctrl = control$checkConv, : Model failed to converge
with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
checkConv(attr(opt, "derivs"), opt$par, ctrl =
control$checkConv, : Model is nearly unidentifiable: large
eigenvalue ratio - Rescale variables?

My data has a binary outcome, correct or incorrect, a fixed
effect predictor factor with 8 levels, and a random effect
for participants. I believe the problem R is encountering is
with one level of the factor (let us call it level B) which
has no counts (no I won' t try to post the table from the
paper with the counts because I know it will get garbled
up!).

I attempt a glm with the same data:

trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
family =
binomial)
anova(trial)
Analysis of Deviance Table

Model: binomial, link: logit

Response: Correct

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev NULL
384     289.63 Syntax.Semantics  7   34.651       377
254.97
summary(trial)
Call: glm(formula = Correct ~ Syntax.Semantics, family =
binomial, data = trialglm)

Deviance Residuals: Min        1Q    Median        3Q
Max -0.79480  -0.62569  -0.34474  -0.00013   2.52113

Coefficients: Estimate Std. Error z value Pr(>|z|) 
(Intercept)                 -1.6917     0.4113  -4.113
3.91e-05 *** Syntax.Semantics A   0.7013     0.5241   1.338
0.1809 Syntax.Semantics B -16.8744   904.5273  -0.019
0.9851 Syntax.Semantics C  -1.1015     0.7231  -1.523
0.1277 Syntax.Semantics D   0.1602     0.5667   0.283
0.7774 Syntax.Semantics E  -0.8733     0.7267  -1.202
0.2295 Syntax.Semantics F  -1.4438     0.8312  -1.737
0.0824 . Syntax.Semantics G   0.4630     0.5262   0.880
0.3789 --- Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05
?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 289.63  on 384  degrees of freedom Residual
deviance: 254.98  on 377  degrees of freedom AIC: 270.98

Number of Fisher Scoring iterations: 17

The comparison I'm interested in is between level B and the
reference level but it cannot be estimated as shown by the
ridiculously high estimate and SE value.

Any suggestions on how to get a decent beta, SE, z, and p?
It's the only comparison missing in the table for the levels
I need so I think it
would
be a bit unacademic of me to close this deal saying 'the
difference
could
not be estimated due to zero count'.

And by the way I have seen this comparison being generated
using other stats.

Thanks in advance,

Frank

[[alternative HTML version deleted]]

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVZ2DeAAoJEOCV5YRblxUH9f0IAN/LTzJllxXqmdP4U2bbDNOR
XnjYDsQ+cF6eR6aRMxWK1nj7Lgdi1pvqOU/3CSMVke2HW2Cr07wR2VDtqHwWRAgZ
jTlzlJ/iA5o32T1U2Wm9jrle0E0RpTMrA8SZ8HsGVKT3cD/5TNo9eoPAw3DV45AO
hmwUJf0NYLhZwOJ2QAsk1rAn06CBmrVSXFUmdGKpODELFJ4whAn95phE8pLY+aW9
qfO4Rq4FcZt1wdRwlZmk8woEeqeySb+rBRxZCVQ0HuyoEGONHMq5Wa1hnffwVR3V
yiIo1Vtd7sTbxAs96DeP8AItyHTvgsKRJphEK/PYguDQCGeR70sQEL53FTdHM60=
=3UD2
-----END PGP SIGNATURE-----