Perfectly correlated random effects (when they shouldn't be)

svm

Wed, Jul 15, 2015 8:42 AM

Appreciate all the input. My preference would be for the minimal models for
concern of computation and overparameterization, but my field is one where
the reviewer is always right and junk models are encouraged. I do
appreciate the insight about the small number of regions. I think I had
been mistaken in my belief that the number of countries per region would
compensate even if the number of regions themselves were small.

On Wed, Jul 15, 2015 at 7:32 AM, Reinhold Kliegl <reinhold.kliegl at gmail.com>
wrote:

Correction: ... a counterexample to the claim that maximal models are
always conservative (not: anti-conservative).


On Wed, Jul 15, 2015 at 1:25 PM, Reinhold Kliegl <
reinhold.kliegl at gmail.com> wrote:

Actually, I overlooked that the region-related variance component of x2
can also be removed from the GLMM without loss of goodness of fit. Thus, I
end up with an intercept-only GLMM as the most parsimonious model for this
data set. Here is the anova() output:

anova(zcpM3, zcpM2, zcpM1, M1)

Data: data
Models:
zcpM3: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 | region)
zcpM2: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x2 ||
region)
zcpM1: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x1 + x2 ||
region)
M1: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x1 + x2 |
region)
      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
zcpM3  6 258193 258255 -129091   258181                          (only
intercept, parsimonious GLMM)
zcpM2  7 258195 258267 -129091   258181 0.1882      1     0.6644
zcpM1  8 258197 258279 -129091   258181 0.0000      1     1.0000
M1    11 258201 258314 -129090   258179 1.8885      3     0.5959
(overparameterized max GLMM)

There is another interesting aspect to this reduction of model
complexity. The parsimonious GLMM yields the SMALLEST z-value for the
significant fixed effect of x1. Thus, this is a counterexample to the claim
that maximal models are always anti-conservative. Whether this is the case
or not may depend on whether the maximal model is overparameterized or not.
Here are the corresponding fixed-effect estimates for x1 and x2:

Fixed effects for x1 across GLMMs:
                    Estimate Std. Error z value Pr(>|z|)
zcpM3: x1           0.51004    0.23859    2.138  0.0325 *  (only
intercept, parsimonious GLMM)
zcpM2: x1           0.515822   0.167005   3.089  0.00201 **
zcpM1: x1           0.516168   0.149704   3.448  0.000565 ***
M1:    x1           0.48913    0.19972    2.449  0.0143 *
 (overparameterized max GLMM)

Fixed effects for x2 across GLMMs:
zcpM3: x2          -0.02743    0.15881   -0.173  0.8629    (only
intercept, parsimonious GLMM)
zcpM2: x2          -0.008715   0.151586  -0.057  0.95416
zcpM1: x2          -0.008701   0.155332  -0.056  0.955331
M1:    x2           0.04422    0.17376    0.254  0.7991
 (overparameterized max GLMM)

On Wed, Jul 15, 2015 at 10:57 AM, Reinhold Kliegl <
reinhold.kliegl at gmail.com> wrote:

It looks like your model is overparameterized. Specifically, although
you have a very large number of observations (N=212570),  the data do not
support the six model parameters you try to estimate for the random-effects
term of region. You can see this quickly with the rePCA() function. There
was also a convergence warning for me, but perhaps this was due to the fact
that I worked with the full set (as provided in the dropbox file), whereas
you used a subset.

As a first step, I forced the correlation parameters to zero with the
||-syntax (zcpM1 GLMM below). A LRT suggests that dropping the three
correlation parameters does not decrease the goodness of fit; Chi-sq(3) =
1.9, p=.5959.  This could very well be a consequence of the small number of
regions, as Jake pointed out; and even 18 regions may not be enough for a
reliable estimation of three correlation parameters.

Moreover, the rePCA() suggests that even the three remaining model
parameters for the random-effects term of region are quite marginally
supported by the data. In the zero-correlation parameter GLMM, the variance
component for x1 was estimated to be suspiciously small.  Therefore, in a
second step, I removed the region-related x1 variance component . Again,
according to a LRT there was no significant loss in goodness of fit;
Chi-sq(1) = 0.

The R script is inserted below. Please note that taking a correlation
parameter or a variance component out of a model does not mean that it is
zero, but that the data do not support a model that assumes that it is
different from zero.  zcpM2 (below) looks like an adequate parsimonious
GLMM for these data.


###

# Note: The original post used a subset of the data, but it is not clear
how they were selected. I used all the data.

library(lme4)
library(RePsychLing)
data <- read.csv("correlated-random-effects.csv", header=TRUE)
str(data)
data$country <- factor(data$country)
data$region <- factor(data$region)

# no convergence problem
print(summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
                          (1 +  x1 + x2 | region), data=data,
family=binomial(link="logit"))))

summary(rePCA(M1))

# convergence problem
print(summary(zcpM1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
                             (1 +  x1 + x2 || region), data=data,
family=binomial(link="logit"))))
summary(rePCA(zcpM1))

# ok
print(summary(zcpM2 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
                                   (1 + x2 || region), data=data,
family=binomial(link="logit"))))
summary(rePCA(zcpM2))

anova(zcpM2, zcpM1, M1)

####




On Wed, Jul 15, 2015 at 3:45 AM, svm <steven.v.miller at gmail.com> wrote:

I considered that. I disaggregated the region random effect from 6 to 18
(the latter of which approximates the World Bank's region
classification).
I'm still encountering the same curious issue.

Random effects:
 Groups       Name        Variance  Std.Dev. Corr
 country:wave (Intercept) 0.1530052 0.39116
 country      (Intercept) 0.3735876 0.61122
 wbregion     (Intercept) 0.0137822 0.11740
              x1        0.0009384 0.03063  -1.00
              x2         0.0767387 0.27702  -1.00  1.00
Number of obs: 212570, groups:  country:wave, 143; country, 82;
wbregion, 18

 For what it's worth: the model estimates fine. The results are
intuitive
and theoretically consistent. They also don't change if I were to remove
that region random effect. I'd like to keep the region random effect
(with
varying slopes) in the model. I'm struggling with what I should think
about
the perfect correlations.

On Tue, Jul 14, 2015 at 9:07 PM, Jake Westfall <jake987722 at hotmail.com>
wrote:

Hi Steve,


I think the issue is that estimating 3 variances and 3 covariances for
regions is quite ambitious given that there are only 6 regions. I

think

it's not surprising that the model has a hard time getting good

estimates

of those parameters.


Jake

Date: Tue, 14 Jul 2015 20:53:01 -0400
From: steven.v.miller at gmail.com
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Perfectly correlated random effects (when they

shouldn't be)

Hi all,

I'm a long-time reader and wanted to raise a question I've seen

asked

here

before about correlated random effects. Past answers I have

encountered

on

this listserv explain that perfectly correlated random effects

suggest

model overfitting and variances of random effects that are

effectively

zero

and can be omitted for a simpler model. In my case, I don't think

that's

what is happening here, though I could well be fitting a poor model

in

glmer.

I'll describe the nature of the data first. I'm modeling

individual-level

survey data for countries across multiple waves and am estimating

the

region of the globe as a random effect as well. I have three random

effects

(country, country-wave, and region). In the region random effect, I

am

allowing country-wave-level predictors to have varying slopes. My

inquiry

is whether some country-wave-level contextual indicator can have an

overall

effect (as a fixed effect), the effect of which can vary by region.

In

other words: is the effect of some country-level indicator (e.g.
unemployment) in a given year different for countries in Western

Europe

than for countries in Africa even if, on average, there is a

positive or

negative association at the individual-level? These

country-wave-level

predictors that I allow to vary by region are the ones reporting

perfect

correlation and I'm unsure how to interpret that (or if I'm

estimating

the

model correctly).

I should also add that I have individual-level predictors as well as
country-wave-level predictors, though it's the latter that concerns

me.

Further, every non-binary indicator in the model is standardized by

two

standard deviations.

For those interested, I have a reproducible (if rather large)

example

below. Dropbox link to the data is here:

https://www.dropbox.com/s/t29jfwm98tsdr71/correlated-random-effects.csv?dl=0

In this reproducible example, y is the outcome variable and x1 and

x2 are

two country-wave-level predictors I allow to vary by region. Both

x1 and

x2

are interval-level predictors that I standardized to have a mean of

zero

and a standard deviation of .5 (per Gelman's (2008) recommendation).

I estimate the following model.

summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |

country:wave) +

(1 +

x1 + x2 | region), data=subset(Data),

family=binomial(link="logit")))

The results are theoretically intuitive. I think they make sense.

However,

I get a report of perfect correlation for the varying slopes of the

region

random effect.

Random effects:
Groups Name Variance Std.Dev. Corr
country:wave (Intercept) 0.15915 0.3989
country (Intercept) 0.32945 0.5740
region (Intercept) 0.01646 0.1283
x1 0.02366 0.1538 1.00
x2 0.13994 0.3741 -1.00 -1.00
Number of obs: 212570, groups: country:wave, 143; country, 82;

region, 6

What should I make of this and am I estimating this model wrong?

For what

it's worth, the dotplot of the region random effect (with

conditional

variance) makes sense and is theoretically intuitive, given my

data. (

http://i.imgur.com/mrnaJ77.png)

Any help would be greatly appreciated.

Best regards,
Steve

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Perfectly correlated random effects (when they shouldn't be)

Thread (9 messages)