Correction: ... a counterexample to the claim that maximal models are
always conservative (not: anti-conservative).
On Wed, Jul 15, 2015 at 1:25 PM, Reinhold Kliegl <
reinhold.kliegl at gmail.com> wrote:
Actually, I overlooked that the region-related variance component of x2
can also be removed from the GLMM without loss of goodness of fit. Thus, I
end up with an intercept-only GLMM as the most parsimonious model for this
data set. Here is the anova() output:
anova(zcpM3, zcpM2, zcpM1, M1)
Data: data
Models:
zcpM3: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 | region)
zcpM2: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x2 ||
region)
zcpM1: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x1 + x2 ||
region)
M1: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x1 + x2 |
region)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
zcpM3 6 258193 258255 -129091 258181 (only
intercept, parsimonious GLMM)
zcpM2 7 258195 258267 -129091 258181 0.1882 1 0.6644
zcpM1 8 258197 258279 -129091 258181 0.0000 1 1.0000
M1 11 258201 258314 -129090 258179 1.8885 3 0.5959
(overparameterized max GLMM)
There is another interesting aspect to this reduction of model
complexity. The parsimonious GLMM yields the SMALLEST z-value for the
significant fixed effect of x1. Thus, this is a counterexample to the claim
that maximal models are always anti-conservative. Whether this is the case
or not may depend on whether the maximal model is overparameterized or not.
Here are the corresponding fixed-effect estimates for x1 and x2:
Fixed effects for x1 across GLMMs:
Estimate Std. Error z value Pr(>|z|)
zcpM3: x1 0.51004 0.23859 2.138 0.0325 * (only
intercept, parsimonious GLMM)
zcpM2: x1 0.515822 0.167005 3.089 0.00201 **
zcpM1: x1 0.516168 0.149704 3.448 0.000565 ***
M1: x1 0.48913 0.19972 2.449 0.0143 *
(overparameterized max GLMM)
Fixed effects for x2 across GLMMs:
zcpM3: x2 -0.02743 0.15881 -0.173 0.8629 (only
intercept, parsimonious GLMM)
zcpM2: x2 -0.008715 0.151586 -0.057 0.95416
zcpM1: x2 -0.008701 0.155332 -0.056 0.955331
M1: x2 0.04422 0.17376 0.254 0.7991
(overparameterized max GLMM)
On Wed, Jul 15, 2015 at 10:57 AM, Reinhold Kliegl <
reinhold.kliegl at gmail.com> wrote:
It looks like your model is overparameterized. Specifically, although
you have a very large number of observations (N=212570), the data do not
support the six model parameters you try to estimate for the random-effects
term of region. You can see this quickly with the rePCA() function. There
was also a convergence warning for me, but perhaps this was due to the fact
that I worked with the full set (as provided in the dropbox file), whereas
you used a subset.
As a first step, I forced the correlation parameters to zero with the
||-syntax (zcpM1 GLMM below). A LRT suggests that dropping the three
correlation parameters does not decrease the goodness of fit; Chi-sq(3) =
1.9, p=.5959. This could very well be a consequence of the small number of
regions, as Jake pointed out; and even 18 regions may not be enough for a
reliable estimation of three correlation parameters.
Moreover, the rePCA() suggests that even the three remaining model
parameters for the random-effects term of region are quite marginally
supported by the data. In the zero-correlation parameter GLMM, the variance
component for x1 was estimated to be suspiciously small. Therefore, in a
second step, I removed the region-related x1 variance component . Again,
according to a LRT there was no significant loss in goodness of fit;
Chi-sq(1) = 0.
The R script is inserted below. Please note that taking a correlation
parameter or a variance component out of a model does not mean that it is
zero, but that the data do not support a model that assumes that it is
different from zero. zcpM2 (below) looks like an adequate parsimonious
GLMM for these data.
###
# Note: The original post used a subset of the data, but it is not clear
how they were selected. I used all the data.
library(lme4)
library(RePsychLing)
data <- read.csv("correlated-random-effects.csv", header=TRUE)
str(data)
data$country <- factor(data$country)
data$region <- factor(data$region)
# no convergence problem
print(summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
(1 + x1 + x2 | region), data=data,
family=binomial(link="logit"))))
summary(rePCA(M1))
# convergence problem
print(summary(zcpM1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
(1 + x1 + x2 || region), data=data,
family=binomial(link="logit"))))
summary(rePCA(zcpM1))
# ok
print(summary(zcpM2 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
(1 + x2 || region), data=data,
family=binomial(link="logit"))))
summary(rePCA(zcpM2))
anova(zcpM2, zcpM1, M1)
####
On Wed, Jul 15, 2015 at 3:45 AM, svm <steven.v.miller at gmail.com> wrote:
I considered that. I disaggregated the region random effect from 6 to 18
(the latter of which approximates the World Bank's region
classification).
I'm still encountering the same curious issue.
Random effects:
Groups Name Variance Std.Dev. Corr
country:wave (Intercept) 0.1530052 0.39116
country (Intercept) 0.3735876 0.61122
wbregion (Intercept) 0.0137822 0.11740
x1 0.0009384 0.03063 -1.00
x2 0.0767387 0.27702 -1.00 1.00
Number of obs: 212570, groups: country:wave, 143; country, 82;
wbregion, 18
For what it's worth: the model estimates fine. The results are
intuitive
and theoretically consistent. They also don't change if I were to remove
that region random effect. I'd like to keep the region random effect
(with
varying slopes) in the model. I'm struggling with what I should think
about
the perfect correlations.
On Tue, Jul 14, 2015 at 9:07 PM, Jake Westfall <jake987722 at hotmail.com>
wrote:
Hi Steve,
I think the issue is that estimating 3 variances and 3 covariances for
regions is quite ambitious given that there are only 6 regions. I
it's not surprising that the model has a hard time getting good
of those parameters.
Jake
Date: Tue, 14 Jul 2015 20:53:01 -0400
From: steven.v.miller at gmail.com
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Perfectly correlated random effects (when they
Hi all,
I'm a long-time reader and wanted to raise a question I've seen
before about correlated random effects. Past answers I have
this listserv explain that perfectly correlated random effects
model overfitting and variances of random effects that are
and can be omitted for a simpler model. In my case, I don't think
what is happening here, though I could well be fitting a poor model
glmer.
I'll describe the nature of the data first. I'm modeling
survey data for countries across multiple waves and am estimating
region of the globe as a random effect as well. I have three random
(country, country-wave, and region). In the region random effect, I
allowing country-wave-level predictors to have varying slopes. My
is whether some country-wave-level contextual indicator can have an
effect (as a fixed effect), the effect of which can vary by region.
other words: is the effect of some country-level indicator (e.g.
unemployment) in a given year different for countries in Western
than for countries in Africa even if, on average, there is a
negative association at the individual-level? These
predictors that I allow to vary by region are the ones reporting
correlation and I'm unsure how to interpret that (or if I'm
model correctly).
I should also add that I have individual-level predictors as well as
country-wave-level predictors, though it's the latter that concerns
Further, every non-binary indicator in the model is standardized by
standard deviations.
For those interested, I have a reproducible (if rather large)
below. Dropbox link to the data is here:
In this reproducible example, y is the outcome variable and x1 and
two country-wave-level predictors I allow to vary by region. Both
are interval-level predictors that I standardized to have a mean of
and a standard deviation of .5 (per Gelman's (2008) recommendation).
I estimate the following model.
summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
x1 + x2 | region), data=subset(Data),
family=binomial(link="logit")))
The results are theoretically intuitive. I think they make sense.
I get a report of perfect correlation for the varying slopes of the
random effect.
Random effects:
Groups Name Variance Std.Dev. Corr
country:wave (Intercept) 0.15915 0.3989
country (Intercept) 0.32945 0.5740
region (Intercept) 0.01646 0.1283
x1 0.02366 0.1538 1.00
x2 0.13994 0.3741 -1.00 -1.00
Number of obs: 212570, groups: country:wave, 143; country, 82;
What should I make of this and am I estimating this model wrong?
it's worth, the dotplot of the region random effect (with
variance) makes sense and is theoretically intuitive, given my