Hi,
I'm seeing poor convergence of a model using the binomial family. In one
case, I have a binomial response for which I use the 'weights' argument.
In a second case, I expand the binomial response to Bernoulli trials and
refit. Theoretically, they should be the same model. However, that does
not appear to be the case because the residual scale for the binomial
case is unbelievably large. Could someone (Prof. Bates) please help
diagnose this problem?
And thank you so much for such wonderful software. I truly appreciate
the effort.
Thanks,
--sundar
tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
tmp[2:4] <- lapply(tmp[2:4], factor)
library(lme4)
fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
binomial("cloglog"), weights = as.numeric(w))
## see below for expand.bin
tmp2 <- expand.bin(tmp, "x", "w")
fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
cbind(sapply(list(fit, fit2), fixef),
sapply(list(fit3, fit4), coef))
# [,1] [,2] [,3] [,4]
#(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
#z2 -0.2680992 -0.2694113 -0.2680992 -0.2680992
#z3 0.3109447 0.3233432 0.3109447 0.3109447
expand.bin <- function (data, x, n) {
char.x <- x
char.n <- n
x <- data[[x]]
n <- data[[n]]
i <- rep(seq(nrow(data)), n)
data <- data[i, , drop = FALSE]
expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
x <- apply(cbind(x, n), 1, expand)
data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
row.names(data) <- seq(nrow(data))
data
}
> sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
other attached packages:
lme4 Matrix lattice
"0.99875-0" "0.99875-1" "0.15-6"
Poor convergence when using binomial response
3 messages · Sundar Dorai-Raj, Douglas Bates
Thanks for sending the example, Sundar, and also for the idea of the expand.bin function. There is a similar example in the lme4 package using the cbpp data set. I modified your expand.bin function to apply to that example with the enclosed results. As you can see the parameter estimates in this example are not exactly the same for the "count out of" form versus the "expanded binary responses" form but they are reasonably close. In general I would recommend using cbind(successes, failures) as the response rather than messing around with the weights. I always manage to confuse myself about what the weights should be and need to go back to the cbind(successes, failures) form to check. As always, it is a good idea to set control = list(msVerbose = 1) to see exactly what is going on with the parameter estimates during the iterations. I hope this helps. If necessary I will look in more detail at your example but I hope this is enough to get you started.
On 5/22/07, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
Hi,
I'm seeing poor convergence of a model using the binomial family. In one
case, I have a binomial response for which I use the 'weights' argument.
In a second case, I expand the binomial response to Bernoulli trials and
refit. Theoretically, they should be the same model. However, that does
not appear to be the case because the residual scale for the binomial
case is unbelievably large. Could someone (Prof. Bates) please help
diagnose this problem?
And thank you so much for such wonderful software. I truly appreciate
the effort.
Thanks,
--sundar
tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
tmp[2:4] <- lapply(tmp[2:4], factor)
library(lme4)
fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
binomial("cloglog"), weights = as.numeric(w))
## see below for expand.bin
tmp2 <- expand.bin(tmp, "x", "w")
fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
cbind(sapply(list(fit, fit2), fixef),
sapply(list(fit3, fit4), coef))
# [,1] [,2] [,3] [,4]
#(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
#z2 -0.2680992 -0.2694113 -0.2680992 -0.2680992
#z3 0.3109447 0.3233432 0.3109447 0.3109447
expand.bin <- function (data, x, n) {
char.x <- x
char.n <- n
x <- data[[x]]
n <- data[[n]]
i <- rep(seq(nrow(data)), n)
data <- data[i, , drop = FALSE]
expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
x <- apply(cbind(x, n), 1, expand)
data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
row.names(data) <- seq(nrow(data))
data
}
> sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
other attached packages:
lme4 Matrix lattice
"0.99875-0" "0.99875-1" "0.15-6"
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-------------- next part --------------
expand.bin <- function(data, x, n)
{
stopifnot(is.character(x), is.character(n),
length(x) == 1, length(n) == 1,
inherits(data, "data.frame"),
all(c(x, n) %in% names(data)))
nn <- as.integer(data[[n]])
ans <- data[rep.int(seq_along(nn), nn), ]
ans[[n]] <- NULL
xx <- as.integer(data[[x]])
ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
row.names(ans) <- seq(nrow(ans))
ans
}
options(show.signif.stars = FALSE)
library(lme4)
example(cbpp)
m3 <- lmer(incidence ~ period + (1|herd),
expand.bin(cbpp, "incidence", "size"), binomial)
m3
-------------- next part --------------
R version 2.5.0 Patched (2007-05-23 r41687)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
expand.bin <- function(data, x, n)
+ {
+ stopifnot(is.character(x), is.character(n),
+ length(x) == 1, length(n) == 1,
+ inherits(data, "data.frame"),
+ all(c(x, n) %in% names(data)))
+ nn <- as.integer(data[[n]])
+ ans <- data[rep.int(seq_along(nn), nn), ]
+ ans[[n]] <- NULL
+ xx <- as.integer(data[[x]])
+ ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
+ as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
+ row.names(ans) <- seq(nrow(ans))
+ ans
+ }
options(show.signif.stars = FALSE) library(lme4)
Loading required package: Matrix Loading required package: lattice
example(cbpp)
cbpp> ## response as a matrix
cbpp> (m1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
cbpp+ family = binomial, data = cbpp))
Generalized linear mixed model fit using Laplace
Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
Data: cbpp
Family: binomial(logit link)
AIC BIC logLik deviance
110.1 120.2 -50.05 100.1
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.41804 0.64656
number of obs: 56, groups: herd, 15
Estimated scale (compare to 1 ) 1.138075
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3981 0.2287 -6.113 9.76e-10
period2 -0.9959 0.3056 -3.259 0.001119
period3 -1.1350 0.3266 -3.475 0.000510
period4 -1.5798 0.4286 -3.686 0.000228
Correlation of Fixed Effects:
(Intr) perid2 perid3
period2 -0.350
period3 -0.327 0.267
period4 -0.248 0.202 0.186
cbpp> ## response as a vector of probabilities and usage of argument "weights"
cbpp> m2 <- lmer(incidence / size ~ period + (1 | herd), weights = size,
cbpp+ family = binomial, data = cbpp)
cbpp> ## Confirm that these are equivalent:
cbpp> stopifnot(all.equal(coef(m1), coef(m2)),
cbpp+ all.equal(ranef(m1), ranef(m2)))
m3 <- lmer(incidence ~ period + (1|herd),
+ expand.bin(cbpp, "incidence", "size"), binomial)
m3
Generalized linear mixed model fit using Laplace
Formula: incidence ~ period + (1 | herd)
Data: expand.bin(cbpp, "incidence", "size")
Family: binomial(logit link)
AIC BIC logLik deviance
565 588.7 -277.5 555
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.41448 0.6438
number of obs: 842, groups: herd, 15
Estimated scale (compare to 1 ) 0.9832878
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3984 0.2282 -6.129 8.86e-10
period2 -0.9934 0.3054 -3.253 0.001144
period3 -1.1332 0.3264 -3.471 0.000518
period4 -1.5805 0.4287 -3.686 0.000227
Correlation of Fixed Effects:
(Intr) perid2 perid3
period2 -0.351
period3 -0.328 0.267
period4 -0.248 0.202 0.186
proc.time()
user system elapsed 7.208 0.120 7.322
Thanks for the quick reply. Here's an interesting twist.
> # use cbind(failures, successes) which is appropriate
> # with the 'cloglog' link but does not converge
> fit <- lmer(cbind(x, w - x) ~ z + (1 | A) + (1 | A:B), tmp,
+ binomial("cloglog"), control = list(msVerbose = 1))
relative tolerance set to 5.56268464626788e-311
> # now use cbind(sucesses, failures) which is
> # inappropriate with the 'cloglog' link but does converge
> fit <- lmer(cbind(w - x, x) ~ z + (1 | A) + (1 | A:B), tmp,
+ binomial("cloglog"), control = list(msVerbose = 1))
relative tolerance set to 1.23153502616496e-05
0: 811.99477: 1.04814 0.0876112 -0.111220 0.888889 0.110250
...
36: 702.77184: 1.14326 0.0930921 -0.150606 0.0958980 0.0605783
>
So it seems the tolerance level is too small and no steps are taken in
nlminb. Using debug the problem seems to be in devLaplace.
Browse[1]> PQLpars
(Intercept) z2 z3
-2.8227711 -0.2680992 0.3109447 0.8888889 0.1102498
Browse[1]> devLaplace(PQLpars)
[1] 1.797693e+308
Browse[1]> abs(0.01/devLaplace(PQLpars))
[1] 5.562685e-311
Also, this isn't an urgent issue for me. I'm satisfied using the
expand.bin (thanks also for the modification) to do the analysis.
Thanks again,
--sundar
Douglas Bates said the following on 5/23/2007 9:49 AM:
Thanks for sending the example, Sundar, and also for the idea of the expand.bin function. There is a similar example in the lme4 package using the cbpp data set. I modified your expand.bin function to apply to that example with the enclosed results. As you can see the parameter estimates in this example are not exactly the same for the "count out of" form versus the "expanded binary responses" form but they are reasonably close. In general I would recommend using cbind(successes, failures) as the response rather than messing around with the weights. I always manage to confuse myself about what the weights should be and need to go back to the cbind(successes, failures) form to check. As always, it is a good idea to set control = list(msVerbose = 1) to see exactly what is going on with the parameter estimates during the iterations. I hope this helps. If necessary I will look in more detail at your example but I hope this is enough to get you started. On 5/22/07, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
Hi,
I'm seeing poor convergence of a model using the binomial family. In one
case, I have a binomial response for which I use the 'weights' argument.
In a second case, I expand the binomial response to Bernoulli trials and
refit. Theoretically, they should be the same model. However, that does
not appear to be the case because the residual scale for the binomial
case is unbelievably large. Could someone (Prof. Bates) please help
diagnose this problem?
And thank you so much for such wonderful software. I truly appreciate
the effort.
Thanks,
--sundar
tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
tmp[2:4] <- lapply(tmp[2:4], factor)
library(lme4)
fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
binomial("cloglog"), weights = as.numeric(w))
## see below for expand.bin
tmp2 <- expand.bin(tmp, "x", "w")
fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
cbind(sapply(list(fit, fit2), fixef),
sapply(list(fit3, fit4), coef))
# [,1] [,2] [,3] [,4]
#(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
#z2 -0.2680992 -0.2694113 -0.2680992 -0.2680992
#z3 0.3109447 0.3233432 0.3109447 0.3109447
expand.bin <- function (data, x, n) {
char.x <- x
char.n <- n
x <- data[[x]]
n <- data[[n]]
i <- rep(seq(nrow(data)), n)
data <- data[i, , drop = FALSE]
expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
x <- apply(cbind(x, n), 1, expand)
data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
row.names(data) <- seq(nrow(data))
data
}
> sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
other attached packages:
lme4 Matrix lattice
"0.99875-0" "0.99875-1" "0.15-6"
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
------------------------------------------------------------------------
expand.bin <- function(data, x, n)
{
stopifnot(is.character(x), is.character(n),
length(x) == 1, length(n) == 1,
inherits(data, "data.frame"),
all(c(x, n) %in% names(data)))
nn <- as.integer(data[[n]])
ans <- data[rep.int(seq_along(nn), nn), ]
ans[[n]] <- NULL
xx <- as.integer(data[[x]])
ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
row.names(ans) <- seq(nrow(ans))
ans
}
options(show.signif.stars = FALSE)
library(lme4)
example(cbpp)
m3 <- lmer(incidence ~ period + (1|herd),
expand.bin(cbpp, "incidence", "size"), binomial)
m3
------------------------------------------------------------------------
R version 2.5.0 Patched (2007-05-23 r41687)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
expand.bin <- function(data, x, n)
+ {
+ stopifnot(is.character(x), is.character(n),
+ length(x) == 1, length(n) == 1,
+ inherits(data, "data.frame"),
+ all(c(x, n) %in% names(data)))
+ nn <- as.integer(data[[n]])
+ ans <- data[rep.int(seq_along(nn), nn), ]
+ ans[[n]] <- NULL
+ xx <- as.integer(data[[x]])
+ ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
+ as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
+ row.names(ans) <- seq(nrow(ans))
+ ans
+ }
options(show.signif.stars = FALSE) library(lme4)
Loading required package: Matrix Loading required package: lattice
example(cbpp)
cbpp> ## response as a matrix
cbpp> (m1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
cbpp+ family = binomial, data = cbpp))
Generalized linear mixed model fit using Laplace
Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
Data: cbpp
Family: binomial(logit link)
AIC BIC logLik deviance
110.1 120.2 -50.05 100.1
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.41804 0.64656
number of obs: 56, groups: herd, 15
Estimated scale (compare to 1 ) 1.138075
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3981 0.2287 -6.113 9.76e-10
period2 -0.9959 0.3056 -3.259 0.001119
period3 -1.1350 0.3266 -3.475 0.000510
period4 -1.5798 0.4286 -3.686 0.000228
Correlation of Fixed Effects:
(Intr) perid2 perid3
period2 -0.350
period3 -0.327 0.267
period4 -0.248 0.202 0.186
cbpp> ## response as a vector of probabilities and usage of argument "weights"
cbpp> m2 <- lmer(incidence / size ~ period + (1 | herd), weights = size,
cbpp+ family = binomial, data = cbpp)
cbpp> ## Confirm that these are equivalent:
cbpp> stopifnot(all.equal(coef(m1), coef(m2)),
cbpp+ all.equal(ranef(m1), ranef(m2)))
m3 <- lmer(incidence ~ period + (1|herd),
+ expand.bin(cbpp, "incidence", "size"), binomial)
m3
Generalized linear mixed model fit using Laplace
Formula: incidence ~ period + (1 | herd)
Data: expand.bin(cbpp, "incidence", "size")
Family: binomial(logit link)
AIC BIC logLik deviance
565 588.7 -277.5 555
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.41448 0.6438
number of obs: 842, groups: herd, 15
Estimated scale (compare to 1 ) 0.9832878
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3984 0.2282 -6.129 8.86e-10
period2 -0.9934 0.3054 -3.253 0.001144
period3 -1.1332 0.3264 -3.471 0.000518
period4 -1.5805 0.4287 -3.686 0.000227
Correlation of Fixed Effects:
(Intr) perid2 perid3
period2 -0.351
period3 -0.328 0.267
period4 -0.248 0.202 0.186
proc.time()
user system elapsed 7.208 0.120 7.322