For what can I use a correlation of fixed effects from (g)lmer?

2 messages · Malcolm Fairbrother, svm

Wed, Feb 24, 2016 1:04 PM #

Dear Steve,
There's a lot in your question. A couple thoughts:
(1) I'm not clear whether you *country-mean-centered* your individual-level
covariates. If not, the country means of those variables could (i.e.,
almost certainly will) correlate to some degree with your country-level
variables. This will almost certainly just confuse matters, such that it
would be best to do the mean-centering. (If you want to include, say,
national mean education as a covariate, in addition to de-meaned individual
level education, you can do so... But that will probably correlate a lot
with, say, GDP/capita.) From what you say, mean-centering will get you what
you want, and it actually might also help you deal with the unhelpful
reviewer comments you're getting. (I totally agree with your reactions to
those. Given what appears to the paucity of logic behind their comments,
surreptitiously not doing what they're saying but appearing to do what
they're saying seems a reasonable strategy. Implicitly including country
means increasing your degrees of freedom at the country level, causing a
reduction in efficiency, as you suggest... Though it's an issue of
collinearity, not just missingness.)
So I think you're wrong that "individual-level variables don't meaningfully
influence the parameter estimates for country-level variables beyond
inefficiency introduced by missing data." But I think you can nonetheless
ignore them--because only the country mean components are having the
impacts you describe, and you seem to have substantive reasons to remove
those components.
(2) Like you, I've never found the "correlation of fixed effects" output
very useful. I generally just suppress/ignore it.
Hope that helps.
- Malcolm


Dr Malcolm Fairbrother
Senior Lecturer in Global Policy and Politics
School of Geographical Sciences
University of Bristol




Date: Mon, 22 Feb 2016 15:46:17 -0500

From: svm <steven.v.miller at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] For what can I use a correlation of fixed effects
        from    (g)lmer?

Hi all,

I have a question that concerns how could I possibly use a correlation of
fixed effects that comes standard with every (g)lmer call. I'll explain the
situation I'm encountering briefly.

   - I used mixed effects models mostly for cross-national survey research.
   I have both individual-level fixed effects and country-level fixed
effects.
   - My interest is mostly the country-level fixed effects. The
   individual-level stuff tends to be standard "controls" that reviewers
want
   to see.
   - I'm not convinced the individual-level fixed effects are entirely
   necessary. My hunch is they just make for inefficient estimates of the
   country-level fixed effects that interest me. The individual-level
   variables just create missing data problems. However, they're stuff that
   reviewers insist on seeing absent any other information about what a
mixed
   effects model is doing.

I have a project (manuscript here:
https://www.dropbox.com/s/harb6ylpcxdpalr/etst.pdf?dl=0 | appendix here:
https://www.dropbox.com/s/pq8gmr7v1xvvu2h/etst-appendix.pdf?dl=0) that
reviewers rejected because the country-level fixed effects were rendered
statistically insignificant (i.e. not discernible from zero) upon the
inclusion of the individual-level variables. They said that one
individual-level attribute (which by itself contributes to listwise
deletion of 30% of the data) somehow made the country-level fixed effects
"spurious" to its inclusion. This already strikes me as a bold claim for
theoretical and statistical reasons, but here's what I did to circumvent
this claim:

   - Estimate just the country-level fixed effects.
   - Use multiple imputation to generate five full data sets and merge in
   the macro-level information after the imputation. The results for the
   country-level fixed effects were almost identical to the analyses with
just
   the country-level fixed effects.
   - Omit the offending individual-level variables that contribute the most
   missingness. These results were consistent with the results from the
other
   two estimation strategies.

However, the reviewers just didn't buy it and torpedoed the manuscript.

Is this something that the correlation of fixed effects could be useful in
addressing? Here's the correlation of fixed effects (without the
intercepts) for the analysis in question. In this analysis, the three
variables at the bottom row (i.e. the two threat indices and the level of
democracy) are the country-level variables for this cross-national survey
analysis. The other variables are individual-level attributes. It's worth
reiterating that every variable that is not binary is scaled by two
standard deviations to create a meaningful zero.

http://i.imgur.com/eIiZH9b.png

Notice that the bottom-left quadrant is entirely white (i.e. the
correlation of the individual-level fixed effects with the country-level
fixed effects is basically zero). Is this telling me that the correlation
for any one individual-level fixed effect and a country-level fixed effect
is almost zero (i.e. they have almost no bearing on each other)? The most
I've seen anyone discuss this correlation matrix is here:

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001941.html

It is an approximate correlation of the estimator of the fixed
effects.  (I include the word "approximate" because I should but in
this case the approximation is very good.)  I'm not sure how to
explain it better than that.  Suppose that you took an MCMC sample
from the parameters in the model, then you would expect the sample of
the fixed-effects parameters to display a correlation structure like
this matrix.

and here (

http://stats.stackexchange.com/questions/57240/how-do-i-interpret-the-correlations-of-fixed-effects-in-my-glmer-output
):

The "correlation of fixed effects" output doesn't have the intuitive
meaning that most would ascribe to it. Specifically, is not about the
correlation of the variables (as OP notes). It is in fact about the
expected correlation of the regression coefficients. Although this may
speak to multicollinearity it does not necessarily.

I should add that I've estimated hundreds of mixed effects models with
individual-level and country-level variables and they all have fixed
effects correlation matrices that resemble these. I have a strong hunch
that individual-level variables don't meaningfully influence the parameter
estimates for country-level variables beyond inefficiency introduced by
missing data. In research projects where individual-level attributes don't
concern the project, I'd like to ignore them for that reason. They just
create estimation problems and slow down computation.

I might be mistaken, which is why I ask here. I thank you for your time.

- Steve

svm

Wed, Feb 24, 2016 2:08 PM #

Hi Malcolm,

Thanks for the response. I actually cite your 2014 PSRM piece in defense of
that argument. I know we're not using the same language in a similar
approach, but I remember you arguing for the exclusion of individual-level
covariates because it would just contribute to missingness and not help
with your overall research question. We're both using WVS data too.
Sometimes, missingness is not random (e.g. WVS not asking about respondent
ideology in several important countries [like China] in one of my projects).

I did want to clarify that my mean-centering approach is inspired by Gelman
(2008), who argues to scale by two standard deviations. When I have two or
three waves, the inclusion of one or more predictors may drop out an entire
wave (e.g. EVS not asking about respondent's education levels until the
third wave). So, I try to scale on the country-wave (for individual-level
variables like age) or the survey wave (at the macro-level attributes like
a country's level of democracy). For example, here's what I do with a
respondent's age in EVS:

EVS <- ddply(EVS, c("ccode","wave"), transform, zg.age = arm::rescale(x003))

and here's what I do with level of democracy (UDS data).

Macro.EVS <- ddply(Macro.EVS, c("wave"), transform, zg.udsmean =
arm::rescale(udsmean))

In the example I cite above, only the bottom three rows in the correlation
matrix are country-level (really: country-year-level) covariates.
Everything else is an individual-level variable.

And yeah, I've never used a correlation of fixed effects before for
anything concerning the model I estimate. I'm curious if I actually could
use that as justification to stop flooding models with individual-level
variables that (I think) don't meaningfully influence the country-level
variables of interest to me (beyond introducing non-random missingness).

- Steve


On Wed, Feb 24, 2016 at 4:04 PM, Malcolm Fairbrother <

M.Fairbrother at bristol.ac.uk> wrote:

Dear Steve,
There's a lot in your question. A couple thoughts:
(1) I'm not clear whether you *country-mean-centered* your
individual-level covariates. If not, the country means of those variables
could (i.e., almost certainly will) correlate to some degree with your
country-level variables. This will almost certainly just confuse matters,
such that it would be best to do the mean-centering. (If you want to
include, say, national mean education as a covariate, in addition to
de-meaned individual level education, you can do so... But that will
probably correlate a lot with, say, GDP/capita.) From what you say,
mean-centering will get you what you want, and it actually might also help
you deal with the unhelpful reviewer comments you're getting. (I totally
agree with your reactions to those. Given what appears to the paucity of
logic behind their comments, surreptitiously not doing what they're saying
but appearing to do what they're saying seems a reasonable strategy.
Implicitly including country means increasing your degrees of freedom at
the country level, causing a reduction in efficiency, as you suggest...
Though it's an issue of collinearity, not just missingness.)
So I think you're wrong that "individual-level variables don't
meaningfully influence the parameter estimates for country-level variables
beyond inefficiency introduced by missing data." But I think you can
nonetheless ignore them--because only the country mean components are
having the impacts you describe, and you seem to have substantive reasons
to remove those components.
(2) Like you, I've never found the "correlation of fixed effects" output
very useful. I generally just suppress/ignore it.
Hope that helps.
- Malcolm


Dr Malcolm Fairbrother
Senior Lecturer in Global Policy and Politics
School of Geographical Sciences
University of Bristol




Date: Mon, 22 Feb 2016 15:46:17 -0500

From: svm <steven.v.miller at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] For what can I use a correlation of fixed effects
        from    (g)lmer?

Hi all,

I have a question that concerns how could I possibly use a correlation of
fixed effects that comes standard with every (g)lmer call. I'll explain
the
situation I'm encountering briefly.

   - I used mixed effects models mostly for cross-national survey
research.
   I have both individual-level fixed effects and country-level fixed
effects.
   - My interest is mostly the country-level fixed effects. The
   individual-level stuff tends to be standard "controls" that reviewers
want
   to see.
   - I'm not convinced the individual-level fixed effects are entirely
   necessary. My hunch is they just make for inefficient estimates of the
   country-level fixed effects that interest me. The individual-level
   variables just create missing data problems. However, they're stuff
that
   reviewers insist on seeing absent any other information about what a
mixed
   effects model is doing.

I have a project (manuscript here:
https://www.dropbox.com/s/harb6ylpcxdpalr/etst.pdf?dl=0 | appendix here:
https://www.dropbox.com/s/pq8gmr7v1xvvu2h/etst-appendix.pdf?dl=0) that
reviewers rejected because the country-level fixed effects were rendered
statistically insignificant (i.e. not discernible from zero) upon the
inclusion of the individual-level variables. They said that one
individual-level attribute (which by itself contributes to listwise
deletion of 30% of the data) somehow made the country-level fixed effects
"spurious" to its inclusion. This already strikes me as a bold claim for
theoretical and statistical reasons, but here's what I did to circumvent
this claim:

   - Estimate just the country-level fixed effects.
   - Use multiple imputation to generate five full data sets and merge in
   the macro-level information after the imputation. The results for the
   country-level fixed effects were almost identical to the analyses with
just
   the country-level fixed effects.
   - Omit the offending individual-level variables that contribute the
most

   missingness. These results were consistent with the results from the
other
   two estimation strategies.

However, the reviewers just didn't buy it and torpedoed the manuscript.

Is this something that the correlation of fixed effects could be useful in
addressing? Here's the correlation of fixed effects (without the
intercepts) for the analysis in question. In this analysis, the three
variables at the bottom row (i.e. the two threat indices and the level of
democracy) are the country-level variables for this cross-national survey
analysis. The other variables are individual-level attributes. It's worth
reiterating that every variable that is not binary is scaled by two
standard deviations to create a meaningful zero.

http://i.imgur.com/eIiZH9b.png

Notice that the bottom-left quadrant is entirely white (i.e. the
correlation of the individual-level fixed effects with the country-level
fixed effects is basically zero). Is this telling me that the correlation
for any one individual-level fixed effect and a country-level fixed effect
is almost zero (i.e. they have almost no bearing on each other)? The most
I've seen anyone discuss this correlation matrix is here:

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001941.html

It is an approximate correlation of the estimator of the fixed
effects.  (I include the word "approximate" because I should but in
this case the approximation is very good.)  I'm not sure how to
explain it better than that.  Suppose that you took an MCMC sample
from the parameters in the model, then you would expect the sample of
the fixed-effects parameters to display a correlation structure like
this matrix.

and here (

http://stats.stackexchange.com/questions/57240/how-do-i-interpret-the-correlations-of-fixed-effects-in-my-glmer-output
):

The "correlation of fixed effects" output doesn't have the intuitive
meaning that most would ascribe to it. Specifically, is not about the
correlation of the variables (as OP notes). It is in fact about the
expected correlation of the regression coefficients. Although this may
speak to multicollinearity it does not necessarily.

I should add that I've estimated hundreds of mixed effects models with
individual-level and country-level variables and they all have fixed
effects correlation matrices that resemble these. I have a strong hunch
that individual-level variables don't meaningfully influence the parameter
estimates for country-level variables beyond inefficiency introduced by
missing data. In research projects where individual-level attributes don't
concern the project, I'd like to ignore them for that reason. They just
create estimation problems and slow down computation.

I might be mistaken, which is why I ask here. I thank you for your time.