Assumptions of random effects for unbiased estimates

6 messages · Laura Dee, Ben Bolker, Jake Westfall +1 more

Tue, Oct 11, 2016 9:02 AM #

Dear all,
Random effects are more efficient estimators ? however they come at the
cost of the assumption that the random effect is not correlated with the
included explanatory variables. Otherwise, using random effects leads to
biased estimates (e.g., as laid out in Woolridge
<https://faculty.fuqua.duke.edu/~moorman/Wooldridge,%20FE%20and%20RE.pdf>'s
Econometrics text). This assumption is a strong one for many observational
datasets, and most analyses in economics do not use random effects for this
reason. *Is there a reason why observational ecological datasets would be
fundamentally different that I am missing? Why is this important assumption
(to have unbiased estimates from random effects) not emphasized in ecology?
*

Thanks!

Laura

Laura Dee
Post-doctoral Associate
University of Minnesota
ledee at umn.edu
lauraedee.com

	[[alternative HTML version deleted]]

Ben Bolker

Tue, Oct 11, 2016 11:50 AM #

I didn't respond to this offline, as it took me a while even to start
to come up to speed on the question.  Random effects are indeed defined
from *very* different points of view in the two communities
([bio]statistical vs. econometric); I'm sure there are points of
contact, but I've been having a hard time getting my head around it all.

Econometric definition:

The wikipedia page <https://en.wikipedia.org/wiki/Random_effects_model>
and CrossValidated question
<http://stats.stackexchange.com/questions/66161/why-do-random-effect-models-require-the-effects-to-be-uncorrelated-with-the-inpu>
were both helpful for me.

 In the (bio)statistical world fixed and random effects are usually
justified practically in terms of shrinkage estimators, or
philosophically in terms of random draws from an exchangeable set of
levels: e.g. see
<http://stats.stackexchange.com/questions/4700/what-is-the-difference-between-fixed-effect-random-effect-and-mixed-effect-mode/>
for links.

  I don't think I can really write an answer yet.  I'm still trying to
understand at an intuitive or heuristic level what it means for
Cov(x_it,c_i)=0, where x_it is a set of explanatory variables over time
for an individual subject and c_i is the conditional mode (=BLUP in
linear mixed-model-land) for the deviation of the individual i from the
population mean ... or more particularly what it means for that
condition to be violated, which is the point at which fixed effects
would become preferred.

  As a side note, some statisticians (Andrew Gelman is the one who
springs to mind) have commented on the possible overemphasis on bias.
(All else being equal unbiased estimators are preferred to biased
estimators but all else is not always equal). Two examples: (1)
penalized estimators such as lasso/ridge regression (closely related to
mixed models) give biased parameter estimates with lower mean squared
error. (2) When estimating variability, one has to choose a particular
scale (variance, standard error, log(standard error), etc.) on which one
would prefer to get an unbiased answer.

On 16-10-11 12:02 PM, Laura Dee wrote:

Jake Westfall

Tue, Oct 11, 2016 12:32 PM #

Hi Laura and Ben,

I like this paper on this topic:
http://psych.colorado.edu/~westfaja/FixedvsRandom.pdf

What it comes down to essentially is that if the cluster effects are
correlated with the "time-varying" (i.e., within-cluster varying) X
predictor -- so that, for example, some clusters have high means on X and
others have low means on X -- then there is the possibility that the
average within-cluster effect (which is what the fixed effect model
estimates) differs from the overall effect of X, not conditional on the
clusters. An extreme example of this is Simpson's paradox. Now since the
estimate from the random-effects model can be seen as a weighted average of
these two effects, it will generally be pulled to some extent away from the
fixed-effect estimate toward the unconditional estimate, which is the bias
that econometricians fret about. However, if the cluster effects are not
correlated with X, so that each cluster has the same mean on X, then this
situation is not possible, so the random-effect model will give the same
unbiased estimate as the fixed-effect model.

A simple solution to this problem is to retain the random-effect model, but
to split the predictor X into two components, one representing the
within-cluster variation of X and the other representing the
between-cluster variation of X, and estimate separate slopes for these two
effects. One can even test whether these two slopes differ from each other,
which is conceptually similar to what the Hausman test does. As described
in the paper linked above, the estimate of the within-cluster component of
the X effect equals the estimate one would obtain from a fixed-effect model.

As for the original question, I can't speak for common practice in ecology,
but I suspect it may be like it is in my home field of psychology, where we
do worry about this issue (to some extent), but we discuss it using
completely different language. That is, we discuss it in terms of whether
there are different effects of the predictor at the within-cluster and
between-cluster levels, and how our model might account for that.

Jake

On Tue, Oct 11, 2016 at 1:50 PM, Ben Bolker <bbolker at gmail.com> wrote:

  I didn't respond to this offline, as it took me a while even to start
to come up to speed on the question.  Random effects are indeed defined
from *very* different points of view in the two communities
([bio]statistical vs. econometric); I'm sure there are points of
contact, but I've been having a hard time getting my head around it all.

Econometric definition:

The wikipedia page <https://en.wikipedia.org/wiki/Random_effects_model>
and CrossValidated question
<http://stats.stackexchange.com/questions/66161/why-do-
random-effect-models-require-the-effects-to-be-uncorrelated-with-the-inpu>
were both helpful for me.

 In the (bio)statistical world fixed and random effects are usually
justified practically in terms of shrinkage estimators, or
philosophically in terms of random draws from an exchangeable set of
levels: e.g. see
<http://stats.stackexchange.com/questions/4700/what-is-
the-difference-between-fixed-effect-random-effect-and-mixed-effect-mode/>
for links.

  I don't think I can really write an answer yet.  I'm still trying to
understand at an intuitive or heuristic level what it means for
Cov(x_it,c_i)=0, where x_it is a set of explanatory variables over time
for an individual subject and c_i is the conditional mode (=BLUP in
linear mixed-model-land) for the deviation of the individual i from the
population mean ... or more particularly what it means for that
condition to be violated, which is the point at which fixed effects
would become preferred.

  As a side note, some statisticians (Andrew Gelman is the one who
springs to mind) have commented on the possible overemphasis on bias.
(All else being equal unbiased estimators are preferred to biased
estimators but all else is not always equal). Two examples: (1)
penalized estimators such as lasso/ridge regression (closely related to
mixed models) give biased parameter estimates with lower mean squared
error. (2) When estimating variability, one has to choose a particular
scale (variance, standard error, log(standard error), etc.) on which one
would prefer to get an unbiased answer.

On 16-10-11 12:02 PM, Laura Dee wrote:

Dear all,
Random effects are more efficient estimators ? however they come at the
cost of the assumption that the random effect is not correlated with the
included explanatory variables. Otherwise, using random effects leads to
biased estimates (e.g., as laid out in Woolridge
<https://faculty.fuqua.duke.edu/~moorman/Wooldridge,%20FE%20and%20RE.pdf
's
Econometrics text). This assumption is a strong one for many
observational datasets, and most analyses in economics do not use random
effects for this reason. *Is there a reason why observational ecological
datasets would be fundamentally different that I am missing? Why is this
important assumption (to have unbiased estimates from random effects)
not emphasized in ecology? *

Thanks!

Laura

--
Laura Dee
Post-doctoral Associate
University of Minnesota
ledee at umn.edu <mailto:ledee at umn.edu>
lauraedee.com <http://lauraedee.com>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Poe, John

Tue, Oct 11, 2016 6:11 PM #

My reading of modern work by panel data econometricians is that they seem
very fine with the use of mixed effects models that properly differentiate
effects at different levels of analysis and the tools to do so have existed
in that literature since the early 1980s. They have been borrowing heavily
from the mixed effects literature in designing econometric models and talk
about them in panel data textbooks. This hasn't typically filtered down to
applied economists who tend to misunderstand what other fields do because
other fields just tend to talk about them differently.

The short version:
Everyone in the mixed effects literature just uses group/grand mean
centering and random coefficients to deal with endogeneity bias. If you are
an economist and someone outside of econ says mixed effects models you
should think *correlated random effects models* and not *random effects
models*.

The long version:
Economists are pretty afraid error structures that are correlated with
independent variables in general and have built up pretty elaborate
statistical models to deal with the problem. In panel data, this manifests
itself as wanting to avoid confounding effects at different levels of
analysis so that within group varying effects are segregated from between
group varying effects. It can also happen when you are omitting higher
level random effects
<http://methods.johndavidpoe.com/2016/09/09/independence-across-levels-in-mixed-effects-models/>
and they are distorting the structure of the random effects that you are
including. This is generally a good thing as you want to be able to test
hypotheses at specific levels of analysis without confounding.

It's a big enough theoretical concern in the discipline that they usually
just want to remove all between group effects from the data as a *default* to
get level one effects because it is simpler and more fool proof than
dealing with the problem in a mixed effects setting. It's so pervasive that
they are often socialized into not designing hypotheses for any between
group or cross-level variation and just focus on within group (time
varying) variability when at all possible (what economists call *within
effects*).

What economists refer to as fixed effects models just difference out all
between group variation so that it cannot contaminate within group effects
(bias level one coefficients). It's the equivalent to including group
indicator variables in the model instead of a random effect and just
accepting that you can't make substantive inferences about anything at the
group level (what economists call *between effects*).

The typical conventional wisdom in applied econometrics is to use a Hausman
test which is a generic test comparing coefficients between a random
effects model (with no level 2 covariates) and a model with all between
group variability removed from the data. If there are differences between
the two, then they prefer to go with the latter. This is bad practice
according to econometrics textbooks but applied people don't seem to care
(Baltagi 2013 ch 4.3). This only makes sense if you don't care about group
invariant variables that only differ crosssectionally and/or you think of
their effects as contamination. Panel data econometrics textbooks tend to
argue for a wider range of options here but in practice not that many
economists seem to use them.

There's an alternative framework in econ for dealing with this problem that
they call a Mundlak device (Mundlak 1978) or correlated random effects
models (Baltagi Handbook of Panel Data 2014 ch 6.3.3 or really any panel
data textbook) which is equivalent to a hierarchical linear model with
group mean centering for level-one variables. This approach is used in
econometrics by some pretty standard advanced panel data models (e.g.
Hausman-Taylor and Arellano Bond). The other alternative that is advocated
by panel data econometricians but doesn't seem to have filtered down to
rank and file economists is to use random coefficients models and just
allow the random effects to be correlated with level one variables (Hsiao
2014 chapter 6 and most of his other written work).

It is important to understand that efficiency isn't the primary reason for
use of a mixed effects model over a fixed effects model for most research.
A common reason to use a mixed effects model is that you have hypotheses
about variables operating at higher levels of analysis or cross-level
interactions and those questions cannot be answered by fixed effects panel
models that have removed all between group variability from the analysis.
You are sacrificing the ability to test group variant hypotheses by using a
basic fixed effects model over a mixed effects model. For nonlinear models
like a logistic regression it can also be very difficult to use an unbiased
fixed effects model (though there are ways in a panel setting e.g. Hahn and
Newy 2004) and trivial to use a mixed effects model.

Panel data econometricians almost always talk about typical practice among
applied economists using fixed effects as flawed (see Baltagi 2013 ch.
4.3). Mark Nerlov's 2000 History of Panel Data Econometrics is my favorite
example:

The absurdity of the contention that possible correlation between some of

See the last couple of pages of this lecture
<http://www.johndavidpoe.com/wp-content/uploads/2012/09/Blalock-Lecture.pdf>
for
the citations in the econometrics and multilevel literature that I
referenced.



On Tue, Oct 11, 2016 at 3:32 PM, Jake Westfall <jake.a.westfall at gmail.com>
wrote:

Hi Laura and Ben,

I like this paper on this topic:
http://psych.colorado.edu/~westfaja/FixedvsRandom.pdf

What it comes down to essentially is that if the cluster effects are
correlated with the "time-varying" (i.e., within-cluster varying) X
predictor -- so that, for example, some clusters have high means on X and
others have low means on X -- then there is the possibility that the
average within-cluster effect (which is what the fixed effect model
estimates) differs from the overall effect of X, not conditional on the
clusters. An extreme example of this is Simpson's paradox. Now since the
estimate from the random-effects model can be seen as a weighted average of
these two effects, it will generally be pulled to some extent away from the
fixed-effect estimate toward the unconditional estimate, which is the bias
that econometricians fret about. However, if the cluster effects are not
correlated with X, so that each cluster has the same mean on X, then this
situation is not possible, so the random-effect model will give the same
unbiased estimate as the fixed-effect model.

A simple solution to this problem is to retain the random-effect model, but
to split the predictor X into two components, one representing the
within-cluster variation of X and the other representing the
between-cluster variation of X, and estimate separate slopes for these two
effects. One can even test whether these two slopes differ from each other,
which is conceptually similar to what the Hausman test does. As described
in the paper linked above, the estimate of the within-cluster component of
the X effect equals the estimate one would obtain from a fixed-effect
model.

As for the original question, I can't speak for common practice in ecology,
but I suspect it may be like it is in my home field of psychology, where we
do worry about this issue (to some extent), but we discuss it using
completely different language. That is, we discuss it in terms of whether
there are different effects of the predictor at the within-cluster and
between-cluster levels, and how our model might account for that.

Jake

On Tue, Oct 11, 2016 at 1:50 PM, Ben Bolker <bbolker at gmail.com> wrote:

  I didn't respond to this offline, as it took me a while even to start
to come up to speed on the question.  Random effects are indeed defined
from *very* different points of view in the two communities
([bio]statistical vs. econometric); I'm sure there are points of
contact, but I've been having a hard time getting my head around it all.

Econometric definition:

The wikipedia page <https://en.wikipedia.org/wiki/Random_effects_model>
and CrossValidated question
<http://stats.stackexchange.com/questions/66161/why-do-
random-effect-models-require-the-effects-to-be-

uncorrelated-with-the-inpu>

were both helpful for me.

 In the (bio)statistical world fixed and random effects are usually
justified practically in terms of shrinkage estimators, or
philosophically in terms of random draws from an exchangeable set of
levels: e.g. see
<http://stats.stackexchange.com/questions/4700/what-is-
the-difference-between-fixed-effect-random-effect-and-

mixed-effect-mode/>

for links.

  I don't think I can really write an answer yet.  I'm still trying to
understand at an intuitive or heuristic level what it means for
Cov(x_it,c_i)=0, where x_it is a set of explanatory variables over time
for an individual subject and c_i is the conditional mode (=BLUP in
linear mixed-model-land) for the deviation of the individual i from the
population mean ... or more particularly what it means for that
condition to be violated, which is the point at which fixed effects
would become preferred.

  As a side note, some statisticians (Andrew Gelman is the one who
springs to mind) have commented on the possible overemphasis on bias.
(All else being equal unbiased estimators are preferred to biased
estimators but all else is not always equal). Two examples: (1)
penalized estimators such as lasso/ridge regression (closely related to
mixed models) give biased parameter estimates with lower mean squared
error. (2) When estimating variability, one has to choose a particular
scale (variance, standard error, log(standard error), etc.) on which one
would prefer to get an unbiased answer.

On 16-10-11 12:02 PM, Laura Dee wrote:

Dear all,
Random effects are more efficient estimators ? however they come at the
cost of the assumption that the random effect is not correlated with

the

included explanatory variables. Otherwise, using random effects leads

to

biased estimates (e.g., as laid out in Woolridge
<https://faculty.fuqua.duke.edu/~moorman/Wooldridge,%20FE%

20and%20RE.pdf

's
Econometrics text). This assumption is a strong one for many
observational datasets, and most analyses in economics do not use

random

effects for this reason. *Is there a reason why observational

ecological

datasets would be fundamentally different that I am missing? Why is

this

important assumption (to have unbiased estimates from random effects)
not emphasized in ecology? *

Thanks!

Laura

--
Laura Dee
Post-doctoral Associate
University of Minnesota
ledee at umn.edu <mailto:ledee at umn.edu>
lauraedee.com <http://lauraedee.com>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Thanks,
John


John Poe
Doctoral Candidate
Department of Political Science
Research Methodologist
UK Center for Public Health Services & Systems Research
University of Kentucky
111 Washington Avenue, Room 203a
Lexington, KY 40536
www.johndavidpoe.com

	[[alternative HTML version deleted]]

Jake Westfall

Tue, Oct 11, 2016 6:49 PM #

What a nice contribution from John!

Jake

On Tue, Oct 11, 2016 at 8:11 PM, Poe, John <jdpo223 at g.uky.edu> wrote:

The typical conventional wisdom in applied econometrics is to use a
Hausman test which is a generic test comparing coefficients between a
random effects model (with no level 2 covariates) and a model with all
between group variability removed from the data. If there are differences
between the two, then they prefer to go with the latter. This is bad
practice according to econometrics textbooks but applied people don't seem
to care (Baltagi 2013 ch 4.3). This only makes sense if you don't care
about group invariant variables that only differ crosssectionally and/or
you think of their effects as contamination. Panel data econometrics
textbooks tend to argue for a wider range of options here but in practice
not that many economists seem to use them.

There's an alternative framework in econ for dealing with this problem
that they call a Mundlak device (Mundlak 1978) or correlated random effects
models (Baltagi Handbook of Panel Data 2014 ch 6.3.3 or really any panel
data textbook) which is equivalent to a hierarchical linear model with
group mean centering for level-one variables. This approach is used in
econometrics by some pretty standard advanced panel data models (e.g.
Hausman-Taylor and Arellano Bond). The other alternative that is advocated
by panel data econometricians but doesn't seem to have filtered down to
rank and file economists is to use random coefficients models and just
allow the random effects to be correlated with level one variables (Hsiao
2014 chapter 6 and most of his other written work).

The absurdity of the contention that possible correlation between some of

the observed explanatory variables and the individual-specific component of
the disturbance is a ground for using fixed effects should be clear from
the following example: Consider a panel of households with data on
consumption and income. We are trying to estimate a consumption function.
Income varies across households and over time. The variation across
households is related to ability of the main earner and other household
specific factors which vary little over time, that is to say, reflect
mainly differences in permanent income. Such permanent differences in
income are widely believed to be the source of most differences in
consumption both crosssectionally and over time, whereas, variations of
income over time are likely to be mostly transitory and unrelated to
consumption in most categories. Yet, fixed-effects regressions are
equivalent to using only this variation and discarding the information on
the consumption-income relationship contained the cross-section variation
among the household means.


See the last couple of pages of this lecture
<http://www.johndavidpoe.com/wp-content/uploads/2012/09/Blalock-Lecture.pdf> for
the citations in the econometrics and multilevel literature that I
referenced.



On Tue, Oct 11, 2016 at 3:32 PM, Jake Westfall <jake.a.westfall at gmail.com>
wrote:

Hi Laura and Ben,

I like this paper on this topic:
http://psych.colorado.edu/~westfaja/FixedvsRandom.pdf

What it comes down to essentially is that if the cluster effects are
correlated with the "time-varying" (i.e., within-cluster varying) X
predictor -- so that, for example, some clusters have high means on X and
others have low means on X -- then there is the possibility that the
average within-cluster effect (which is what the fixed effect model
estimates) differs from the overall effect of X, not conditional on the
clusters. An extreme example of this is Simpson's paradox. Now since the
estimate from the random-effects model can be seen as a weighted average
of
these two effects, it will generally be pulled to some extent away from
the
fixed-effect estimate toward the unconditional estimate, which is the bias
that econometricians fret about. However, if the cluster effects are not
correlated with X, so that each cluster has the same mean on X, then this
situation is not possible, so the random-effect model will give the same
unbiased estimate as the fixed-effect model.

A simple solution to this problem is to retain the random-effect model,
but
to split the predictor X into two components, one representing the
within-cluster variation of X and the other representing the
between-cluster variation of X, and estimate separate slopes for these two
effects. One can even test whether these two slopes differ from each
other,
which is conceptually similar to what the Hausman test does. As described
in the paper linked above, the estimate of the within-cluster component of
the X effect equals the estimate one would obtain from a fixed-effect
model.

As for the original question, I can't speak for common practice in
ecology,
but I suspect it may be like it is in my home field of psychology, where
we
do worry about this issue (to some extent), but we discuss it using
completely different language. That is, we discuss it in terms of whether
there are different effects of the predictor at the within-cluster and
between-cluster levels, and how our model might account for that.

Jake

On Tue, Oct 11, 2016 at 1:50 PM, Ben Bolker <bbolker at gmail.com> wrote:

  I didn't respond to this offline, as it took me a while even to start
to come up to speed on the question.  Random effects are indeed defined
from *very* different points of view in the two communities
([bio]statistical vs. econometric); I'm sure there are points of
contact, but I've been having a hard time getting my head around it all.

Econometric definition:

The wikipedia page <https://en.wikipedia.org/wiki/Random_effects_model>
and CrossValidated question
<http://stats.stackexchange.com/questions/66161/why-do-
random-effect-models-require-the-effects-to-be-uncorrelated-

with-the-inpu>

were both helpful for me.

 In the (bio)statistical world fixed and random effects are usually
justified practically in terms of shrinkage estimators, or
philosophically in terms of random draws from an exchangeable set of
levels: e.g. see
<http://stats.stackexchange.com/questions/4700/what-is-
the-difference-between-fixed-effect-random-effect-and-mixed-

effect-mode/>

for links.

  I don't think I can really write an answer yet.  I'm still trying to
understand at an intuitive or heuristic level what it means for
Cov(x_it,c_i)=0, where x_it is a set of explanatory variables over time
for an individual subject and c_i is the conditional mode (=BLUP in
linear mixed-model-land) for the deviation of the individual i from the
population mean ... or more particularly what it means for that
condition to be violated, which is the point at which fixed effects
would become preferred.

  As a side note, some statisticians (Andrew Gelman is the one who
springs to mind) have commented on the possible overemphasis on bias.
(All else being equal unbiased estimators are preferred to biased
estimators but all else is not always equal). Two examples: (1)
penalized estimators such as lasso/ridge regression (closely related to
mixed models) give biased parameter estimates with lower mean squared
error. (2) When estimating variability, one has to choose a particular
scale (variance, standard error, log(standard error), etc.) on which one
would prefer to get an unbiased answer.

On 16-10-11 12:02 PM, Laura Dee wrote:

Dear all,
Random effects are more efficient estimators ? however they come at

the

cost of the assumption that the random effect is not correlated with

the

included explanatory variables. Otherwise, using random effects leads

to

biased estimates (e.g., as laid out in Woolridge
<https://faculty.fuqua.duke.edu/~moorman/Wooldridge,%20FE%20

and%20RE.pdf

's
Econometrics text). This assumption is a strong one for many
observational datasets, and most analyses in economics do not use

random

effects for this reason. *Is there a reason why observational

ecological

datasets would be fundamentally different that I am missing? Why is

this

important assumption (to have unbiased estimates from random effects)
not emphasized in ecology? *

Thanks!

Laura

--
Laura Dee
Post-doctoral Associate
University of Minnesota
ledee at umn.edu <mailto:ledee at umn.edu>
lauraedee.com <http://lauraedee.com>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Poe, John

Tue, Oct 11, 2016 7:47 PM #

Thanks Jake!

On Oct 11, 2016 9:50 PM, "Jake Westfall" <jake.a.westfall at gmail.com> wrote:

What a nice contribution from John!

Jake

On Tue, Oct 11, 2016 at 8:11 PM, Poe, John <jdpo223 at g.uky.edu> wrote:

My reading of modern work by panel data econometricians is that they seem
very fine with the use of mixed effects models that properly

differentiate

effects at different levels of analysis and the tools to do so have

existed

in that literature since the early 1980s. They have been borrowing

heavily

from the mixed effects literature in designing econometric models and

talk

about them in panel data textbooks. This hasn't typically filtered down

to

applied economists who tend to misunderstand what other fields do because
other fields just tend to talk about them differently.

The short version:
Everyone in the mixed effects literature just uses group/grand mean
centering and random coefficients to deal with endogeneity bias. If you

are

an economist and someone outside of econ says mixed effects models you
should think *correlated random effects models* and not *random effects
models*.

The long version:
Economists are pretty afraid error structures that are correlated with
independent variables in general and have built up pretty elaborate
statistical models to deal with the problem. In panel data, this

manifests

itself as wanting to avoid confounding effects at different levels of
analysis so that within group varying effects are segregated from between
group varying effects. It can also happen when you are omitting higher
level random effects
<http://methods.johndavidpoe.com/2016/09/09/independence-

across-levels-in-mixed-effects-models/>

and they are distorting the structure of the random effects that you are
including. This is generally a good thing as you want to be able to test
hypotheses at specific levels of analysis without confounding.

It's a big enough theoretical concern in the discipline that they usually
just want to remove all between group effects from the data as a

*default* to

get level one effects because it is simpler and more fool proof than
dealing with the problem in a mixed effects setting. It's so pervasive

that

they are often socialized into not designing hypotheses for any between
group or cross-level variation and just focus on within group (time
varying) variability when at all possible (what economists call *within
effects*).

What economists refer to as fixed effects models just difference out all
between group variation so that it cannot contaminate within group

effects

(bias level one coefficients). It's the equivalent to including group
indicator variables in the model instead of a random effect and just
accepting that you can't make substantive inferences about anything at

the

group level (what economists call *between effects*).

The typical conventional wisdom in applied econometrics is to use a
Hausman test which is a generic test comparing coefficients between a
random effects model (with no level 2 covariates) and a model with all
between group variability removed from the data. If there are differences
between the two, then they prefer to go with the latter. This is bad
practice according to econometrics textbooks but applied people don't

seem

to care (Baltagi 2013 ch 4.3). This only makes sense if you don't care
about group invariant variables that only differ crosssectionally and/or
you think of their effects as contamination. Panel data econometrics
textbooks tend to argue for a wider range of options here but in practice
not that many economists seem to use them.

There's an alternative framework in econ for dealing with this problem
that they call a Mundlak device (Mundlak 1978) or correlated random

effects

models (Baltagi Handbook of Panel Data 2014 ch 6.3.3 or really any panel
data textbook) which is equivalent to a hierarchical linear model with
group mean centering for level-one variables. This approach is used in
econometrics by some pretty standard advanced panel data models (e.g.
Hausman-Taylor and Arellano Bond). The other alternative that is

advocated

by panel data econometricians but doesn't seem to have filtered down to
rank and file economists is to use random coefficients models and just
allow the random effects to be correlated with level one variables (Hsiao
2014 chapter 6 and most of his other written work).

It is important to understand that efficiency isn't the primary reason

for

use of a mixed effects model over a fixed effects model for most

research.

A common reason to use a mixed effects model is that you have hypotheses
about variables operating at higher levels of analysis or cross-level
interactions and those questions cannot be answered by fixed effects

panel

models that have removed all between group variability from the analysis.
You are sacrificing the ability to test group variant hypotheses by

using a

basic fixed effects model over a mixed effects model. For nonlinear

models

like a logistic regression it can also be very difficult to use an

unbiased

fixed effects model (though there are ways in a panel setting e.g. Hahn

and

Newy 2004) and trivial to use a mixed effects model.

Panel data econometricians almost always talk about typical practice

among

applied economists using fixed effects as flawed (see Baltagi 2013 ch.
4.3). Mark Nerlov's 2000 History of Panel Data Econometrics is my

favorite

example:

The absurdity of the contention that possible correlation between some of

the observed explanatory variables and the individual-specific

component of

the disturbance is a ground for using fixed effects should be clear from
the following example: Consider a panel of households with data on
consumption and income. We are trying to estimate a consumption

function.

Income varies across households and over time. The variation across
households is related to ability of the main earner and other household
specific factors which vary little over time, that is to say, reflect
mainly differences in permanent income. Such permanent differences in
income are widely believed to be the source of most differences in
consumption both crosssectionally and over time, whereas, variations of
income over time are likely to be mostly transitory and unrelated to
consumption in most categories. Yet, fixed-effects regressions are
equivalent to using only this variation and discarding the information

on

the consumption-income relationship contained the cross-section

variation

among the household means.


See the last couple of pages of this lecture
<http://www.johndavidpoe.com/wp-content/uploads/2012/09/

Blalock-Lecture.pdf> for

the citations in the econometrics and multilevel literature that I
referenced.



On Tue, Oct 11, 2016 at 3:32 PM, Jake Westfall <

jake.a.westfall at gmail.com>

wrote:

Hi Laura and Ben,

I like this paper on this topic:
http://psych.colorado.edu/~westfaja/FixedvsRandom.pdf

What it comes down to essentially is that if the cluster effects are
correlated with the "time-varying" (i.e., within-cluster varying) X
predictor -- so that, for example, some clusters have high means on X

and

others have low means on X -- then there is the possibility that the
average within-cluster effect (which is what the fixed effect model
estimates) differs from the overall effect of X, not conditional on the
clusters. An extreme example of this is Simpson's paradox. Now since the
estimate from the random-effects model can be seen as a weighted average
of
these two effects, it will generally be pulled to some extent away from
the
fixed-effect estimate toward the unconditional estimate, which is the

bias

that econometricians fret about. However, if the cluster effects are not
correlated with X, so that each cluster has the same mean on X, then

this

situation is not possible, so the random-effect model will give the same
unbiased estimate as the fixed-effect model.

A simple solution to this problem is to retain the random-effect model,
but
to split the predictor X into two components, one representing the
within-cluster variation of X and the other representing the
between-cluster variation of X, and estimate separate slopes for these

two

effects. One can even test whether these two slopes differ from each
other,
which is conceptually similar to what the Hausman test does. As

described

in the paper linked above, the estimate of the within-cluster component

of

the X effect equals the estimate one would obtain from a fixed-effect
model.

As for the original question, I can't speak for common practice in
ecology,
but I suspect it may be like it is in my home field of psychology, where
we
do worry about this issue (to some extent), but we discuss it using
completely different language. That is, we discuss it in terms of

whether

there are different effects of the predictor at the within-cluster and
between-cluster levels, and how our model might account for that.

Jake

On Tue, Oct 11, 2016 at 1:50 PM, Ben Bolker <bbolker at gmail.com> wrote:

  I didn't respond to this offline, as it took me a while even to

start

to come up to speed on the question.  Random effects are indeed

defined

from *very* different points of view in the two communities
([bio]statistical vs. econometric); I'm sure there are points of
contact, but I've been having a hard time getting my head around it

all.

Econometric definition:

The wikipedia page <https://en.wikipedia.org/

wiki/Random_effects_model>

and CrossValidated question
<http://stats.stackexchange.com/questions/66161/why-do-
random-effect-models-require-the-effects-to-be-uncorrelated-

with-the-inpu>

were both helpful for me.

 In the (bio)statistical world fixed and random effects are usually
justified practically in terms of shrinkage estimators, or
philosophically in terms of random draws from an exchangeable set of
levels: e.g. see
<http://stats.stackexchange.com/questions/4700/what-is-
the-difference-between-fixed-effect-random-effect-and-mixed-

effect-mode/>

for links.

  I don't think I can really write an answer yet.  I'm still trying to
understand at an intuitive or heuristic level what it means for
Cov(x_it,c_i)=0, where x_it is a set of explanatory variables over

time

for an individual subject and c_i is the conditional mode (=BLUP in
linear mixed-model-land) for the deviation of the individual i from

the

population mean ... or more particularly what it means for that
condition to be violated, which is the point at which fixed effects
would become preferred.

  As a side note, some statisticians (Andrew Gelman is the one who
springs to mind) have commented on the possible overemphasis on bias.
(All else being equal unbiased estimators are preferred to biased
estimators but all else is not always equal). Two examples: (1)
penalized estimators such as lasso/ridge regression (closely related

to

mixed models) give biased parameter estimates with lower mean squared
error. (2) When estimating variability, one has to choose a particular
scale (variance, standard error, log(standard error), etc.) on which

one

would prefer to get an unbiased answer.

On 16-10-11 12:02 PM, Laura Dee wrote:

Dear all,
Random effects are more efficient estimators ? however they come at

the

cost of the assumption that the random effect is not correlated with

the

included explanatory variables. Otherwise, using random effects

leads

to

biased estimates (e.g., as laid out in Woolridge
<https://faculty.fuqua.duke.edu/~moorman/Wooldridge,%20FE%20

and%20RE.pdf

's
Econometrics text). This assumption is a strong one for many
observational datasets, and most analyses in economics do not use

random

effects for this reason. *Is there a reason why observational

ecological

datasets would be fundamentally different that I am missing? Why is

this

important assumption (to have unbiased estimates from random

effects)

not emphasized in ecology? *

Thanks!

Laura

--
Laura Dee
Post-doctoral Associate
University of Minnesota
ledee at umn.edu <mailto:ledee at umn.edu>
lauraedee.com <http://lauraedee.com>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models