Skip to content

Logistic Regression for Matched Case-Control Data

3 messages · Viechtbauer Wolfgang (STAT), Ben Bolker

#
Dear All,

I am trying to wrap my head around using logistic random-effects regression models for the analysis of matched data and the results I obtain with lmer() when using this approach. So, let's say we have dichotomous outcomes for matched subjects in 2 groups (could also be repeated measurements on a single subject -- the idea is the same). We can write the subject-specific model as:

P(Y_ij = 1) = alpha_i + beta x_ij,

where Y_ij is the observed outcome (either 1 or 0) for subject j (either 1 or 2) for pair i (j = 1, ..., n), x_ij=0 for group 1, x_ij=1 for group 2, alpha_i is the intercept for pair i, and beta is the log(OR). Estimating this model with fixed intercepts is problematic, so the usual approach is to use a conditional logistic model (conditioning on Y_i1 + Y_i2). The conditional ML estimator of beta (and its SE) can actually be given in closed form, so we can easily check the results. Alternatively, we can treat the data for a single pair as a 2x2 table (with 2 subjects) and use the Cochran-Mantel-Haenszel test for the stratified 2x2 table data. This is in fact identical to McNemar's test for the same data. Here is an example:

########################################################################

library(survival)
library(metafor)

### number of pairs
n <- 100

### create some data
ai <- c(rep(0,n/2), rep(1,n/2))
bi <- 1-ai
ci <- c(rep(0,42), rep(1,8), rep(0,18), rep(1,32))
di <- 1-ci

### matched pair data
tab <- table(ai,ci)
tab

### McNemar's test
mcnemar.test(table(ai,ci))

### conditional ML estimator of log(OR) and corresponding SE
round(log(tab[2,1]/tab[1,2]),4)
round(sqrt(1/tab[2,1] + 1/tab[1,2]),4)

### n x 2 x 2 stratified data analysis with CMH test
rma.mh(ai=ai, bi=bi, ci=ci, di=di, measure="OR")

### change data to long format 
event <- c(rbind(ai,ci))
group <- rep(c(1,0), times=n)
id    <- rep(1:n, each=2)

### conditional logistic model
summary(clogit(event ~ group + strata(id)))

########################################################################

So, all matches up as expected. An alternative approach to estimating the model above is described, for example, in Agresti (2002), chapter 10 (esp. 10.2.4). Here, the alpha_i values are treated as random effects, typically assumed to be drawn from N(mu, sigma^2). So, let's try out this approach, using lmer():

########################################################################

library(lme4)
lmer(event ~ group + (1 | id), family=binomial, nAGQ=21)

########################################################################

According to Agresti (2002; see also Neuhaus et al., 1994), this should yield the same estimate of beta for matched pairs with a non-negative sample log odds ratio (as in this example). With the right number of quadrature points (e.g., nAGQ=21), this does indeed yield the same value. But this appears to depend highly on what value of nAGQ is chosen. Also, the SE is different. I tried fitting the same model in some other software to compare results. For example, in Stata using xtmelogit, I get results that match up completely with those from the other approaches, including the SE of hat(beta).

So, I am wondering if the high dependence of lmer() on the number of quadrature points in this particular application is expected. And is there an explanation for why the SE of hat(beta) is different in lmer()? I realize that clusters of size 2 is probably not something that lmer() was meant for, but the theory apparently says that the results should match up, so I am wondering if there is an intuitive explanation for what is going on here.

Best,
Wolfgang

References

Agresti, A. (2002). Categorical data analysis (2nd ed.). Wiley.

Neuhaus, J. M., Kalbfleisch, J. D., Hauck, W. W. (1994). Conditions for consistent estimation in mixed-effects models for binary matched-pairs data. Canadian Journal of Statistics, 22(1), 139-148.

--   
Wolfgang Viechtbauer, Ph.D., Statistician   
Department of Psychiatry and Psychology   
School for Mental Health and Neuroscience   
Faculty of Health, Medicine, and Life Sciences   
Maastricht University, P.O. Box 616 (VIJV1)   
6200 MD Maastricht, The Netherlands   
+31 (43) 388-4170 | http://www.wvbauer.com
#
Viechtbauer Wolfgang (STAT <wolfgang.viechtbauer at ...> writes:
[snip snip snip to make Gmane happy]
With the development version of lme4 (which might ?? be more stable
?) I get values of the fixed effect of 0.8109 (plus or minus about
3e-5) as long as nAGQ >= 9 ...  for lme4.0 (which should be equivalent
to the CRAN version) it seems to take _slightly_ higher nAGQ to get
precise answers, but it still doesn't seem that sensitive to me.
(Might vary across platforms?)  However, I can't speak to the standard
error of hat(beta) ...
#
Thanks for looking into this. Yes, for nAGQ values above ~10, it's indeed pretty stable, regardless of the version, and it does give the right value. I was just worried about the behavior for a smaller number of quadrature points. But maybe this is nothing unusual when using glmer() for this application. However, the SE is always off, regardless of the nAGQ value.

I also tried fitting the same model with the glmmML() function from the package with the same name:

library(glmmML)
glmmML(event ~ group, cluster=id, family=binomial, method="ghq", n.points=21)

This gives b1=.8109 with SE=.4249 as expected. So, at least with respect to the SE, glmer() is doing something different here.

Best,
Wolfgang