Skip to content

Random vs. fixed effects

15 messages · Schultz, Mark R., Daniel Ezra Johnson, Liaw, Andy +8 more

#
I'm not personally an expert, but from reading this list for some
time, the consensus is that three points is not enough to make a
reasonable estimate of a variance. Nor is a three-level effect likely
to be nested within a fixed effect, which would require it be treated
as random.

So yes, a fixed effect sounds like the way to go here, but as far as
what differences you'd get by treating it as fixed or random, in terms
of inference or prediction (from the BLUPs, if random) you might want
to investigate that. If you want to make predictions to new population
members, then of course random is your only choice.

Dan
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
#
I'm by no means expert, but it seems to me that this is more a
philosophical question than a technical one.

To me, a factor is treated a fixed effect if the interest is in the
differences from one level to another (or some contrasts).  A random
factor, on the other hand, is when the interest is in the variability
due to the factor, and the levels of the factor can be considered as a
sample from a (Gaussian) population.  The problem is, if a factor has
only three levels, can we really reliably estimate the variance of the
population from which the three levels of the factor were drawn from?
Well, if you must, you must.  However, it seems to me that if the factor
is really a blocking variable (thus basically nuisance parameters), one
can go either way.

I'd very much welcome the real experts' corrections or comments.

Andy
Notice:  This e-mail message, together with any attachme...{{dropped:10}}
#
Here is a simulation of 10k cases using 4 and 50 level factors for the
random effect.  With 4 levels there are numerical problems and the
accuracy of the random effect is terrible.  With 50 levels there are
no numerical problems and the accuracy is much better.
+ set.seed(1)
+ x <- 1:n
+ fac <- gl(k, 1, n)
+ fac.eff <- rnorm(k, 0, 4)[fac]
+ e <- rnorm(n)
+ y <- 1 + 2 * x + fac.eff + e
+ lmer(y ~ x + (1|fac))
+ }
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
   AIC   BIC logLik deviance REMLdev
 28733 28762 -14363    28702   28725
Random effects:
 Groups   Name        Variance Std.Dev.
 fac      (Intercept) 1.1162   1.0565
 Residual             1.0298   1.0148
Number of obs: 10000, groups: fac, 4

Fixed effects:
             Estimate Std. Error t value
(Intercept) 1.313e+00  5.286e-01       2
x           2.000e+00  3.515e-06  568923

Correlation of Fixed Effects:
  (Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
   AIC   BIC logLik deviance REMLdev
 29040 29069 -14516    29009   29032
Random effects:
 Groups   Name        Variance Std.Dev.
 fac      (Intercept) 11.2016  3.3469
 Residual              1.0251  1.0125
Number of obs: 10000, groups: fac, 50

Fixed effects:
             Estimate Std. Error t value
(Intercept) 1.396e+00  4.738e-01       3
x           2.000e+00  3.507e-06  570242

Correlation of Fixed Effects:
  (Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
#
Here's my question for the group:

  Given that it is a reasonable *philosophical* position to say 'treat
philosophically random effects as random no matter what, and leave them
in the model even if they don't appear to be statistically significant',
and given that with small numbers of random-effect levels this approach
is likely to lead to numerical difficulties in most (??) mixed model
packages (warnings, errors, or low estimates of the variance), what
should one do?  (Suppose one is in a situation that is too complicated
to use classical method-of-moments approaches -- crossed designs, highly
unbalanced data, GLMMs ...)

 1. philosophy, schmilosophy.  Fit these factors as a fixed effect,
anything else is too dangerous/misleading/unworkable.
 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED,
...) and hope it doesn't break.  Ignore warnings.
 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model
Builder with post-hoc MCMC calculation? Data cloning?)?  Possibly with
half-Cauchy priors on variance as recommended by Gelman [Bayesian
Analysis (2006) 1, Number 3, pp. 515?533]?
Gabor Grothendieck wrote:

  
    
#
The answer is effect-size dependent, is it not?

If you fit the random effect and the algorithm 
works without failure, why not use it?

If it doesn't work, you have a faulty tool for 
estimation. Punting to a fixed model is one way 
out of the problem. Another is matched-on-the-random-factor data analysis.

Pragmatism is certainly an issue. But what if you 
have 10 centers as a factor with known 
correlation issues. If you analyze with one set 
of predictors, missing values leaves you with 
only 5 centers, so you treat centers as a fixed 
effect with 5 levels. If you use another set of 
predictors, you have all 10 levels, so you use 
centers as a random effect with a variance. Isn't 
intellectual consistency an issue here too? How 
do you explain this in the executive summary?

One thing you can do if the mixed modeling fails 
is to use the standard deviation among levels of 
the random-treated-as-fixed factor as an estimate 
of the random effect. This would at least maintain consistency of concept.

Note that I'm not a mixed modeling expert, so my 
opinions may not be worth much.
At 02:11 PM 4/23/2010, Ben Bolker wrote:
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================
#
Two cents from an humble practitioner :

Le vendredi 23 avril 2010 ? 14:11 -0400, Ben Bolker a ?crit :
Schmilosophically speaking, option #3 has a set of interesting
features :

- It entails the creation and fitting of a full joint probability model.
Internal consistency is guranteed.

- Bayesian modeling tools (especially BUGS) offer enough flexibility to
*require* explicit *choices* for model assumption. For exmple, the
"standard" choice of a normal distribution for level II effects has to
be *explcitely writtern by the modeler, which gives him/her an
opportunity to consider and/or justify his/her choice. This is also true
of the choice of the modelization of nuisance parameters.

- There is no "formal" distinction between fixed and random effects ;
the latter are given a distribution (to be fitted), whereas the formere
are not.

However : 

- getting such a model to converge can be hard (especilly with BUGS).
Current tools need a real (maybe too much) understanding of the
numerical analysis problems to choose an *efficient* model expression,
and some "tricks" used to get convergence (such as rendering a model
deliberately unidentifible...) are difficult to justify rom a modeling
standpoint. Again, Gelman & Hill comparison of the current Bayesian
tools to early-generation regression software seems quite apt.

- The choices made about shape of distributions, relationships, etc ...
should be submitted to a sensitivity analysis, which raises again the
statistician's workload.

In short, option #3 is the *expensive* way of (maybe) getting the most
precise honest answer to your problem... if such thing exists and is
reachble with current numerical tools.

Murphy's law ensures that option #2 will "break" (and warnngs will be
ignored) especilly when warnings mean something important.

But option #1 is *also* an acceptable one : using it entails modeling
*conditionally on your experimental setup*. You won't be able to state
(directly) an estimation of the possible "population" effects of the
formerly-random effects, and the validity of your inferences about the
really-fixed effects will be formerly limited to the particular
population defined by these formerly-random effects. But that might be
okay if you have other information allowing you (and your reader) to
independently assess the possible value of an extrapolation from this
particulr subpopulation to an hypothetical population.

In short, option #1 is a cheap wy of getting a ossibly ccetable
approximate solution to your problem, whose value has to be assessed by
other means.

Of course, you won't benefit from artial pooling, and serious inter-unit
interactions will mess your inferences ...

HTH,

					Emmanuel Charpentier
#
For what its worth, I would take approach 3. 

Others have pointed this out before, but I think strict adherence to the 'random' vs 'fixed effects' nomenclature can sometimes do us a disservice. For me I find it easier to think of my multilevel models as having some parameters I allow to vary and some that I don't, or, some parameters are themselves modelled, some aren't. 

Bayesianly, I think of the level above my modelled parameter as a prior. This prior can be relatively informative or uninformative, when the number of groups contributing to the modelled parameter is low then this prior is likely to be relatively uninformative. But nonetheless probably still useful. Yes, the estimated posterior mode for the variance is likely to be underestimated but this is only a problem if I ignore the rest of its distribution. The uncertainty around the estimate of the variance is likely to be very large and in fact will allow for unrealistically large values. This assumes I have specified a flat prior on the variance, but it can be ameliorated if I instead apply an appropriate half-Cauchy. 

In most cases I would treat grouping variables with more than two levels as a single parameter allowed to vary by group, either as a component of an intercept term or as an error term centred on zero. Then, if the data allowed, I might consider allowing other parameters in my model to vary by these groups and therefore take care of the interactions.

Arguably even categorical variables that some would consider inarguably philosophically 'fixed', can be incorporated as a modelled (allowed to vary by group) parameter. Gelman points out that such an approach can negate the 'classical' problem of multiple comparisons. See http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf .

Will Morris 
Masters of Philosophy candidate 
Vesk Plant Ecology Lab 
The School of Botany 
The University of Melbourne 
Australia 
Phone: +61 3 8344 0120 
http://www.botany.unimelb.edu.au/vesk/
On 24/04/2010, at 4:11 AM, Ben Bolker wrote:

            
#
As an addendum, back in the "Old Days", before there were good mixed 
model programs and all you could fit were fixed effect models, you 
had to estimate random effects by fitting the fixed effect model and 
then taking mean squares of the random effect levels as the estimate 
of the variance.

As an example, consider a simple complete randomized block design:

 > require('nlme')
 > set.seed(1)
 > nr<- 2  #reps per case
 > nB<- 4  #blocks
 > nT<- 2  #treatments
 > n<- nT*nB*nr  #total data
 > T<- rep(c(-1,1),nB*nr)
 > B<- factor(c(rep(1,nr*nT),rep(2,nr*nT),rep(3,nr*nT),rep(4,nr*nT)))
 > rB<- rnorm(nB,0,4)[B]
 > e<- rnorm(n,0,1)  #residuals
 > y<- T + rB + e
 > fit4m<- lme(y ~ T, random=~1|B)
 > summary(fit4m)
Linear mixed-effects model fit by REML
  Data: NULL
        AIC      BIC    logLik
   63.22191 65.77814 -27.61096

Random effects:
  Formula: ~1 | B
         (Intercept)  Residual
StdDev:    4.767061 0.8488003

Fixed effects: y ~ T
                 Value Std.Error DF  t-value p-value
(Intercept) 0.5351939 2.3929577 11 0.223654  0.8271
T           0.6916997 0.2122001 11 3.259658  0.0076
  Correlation:
   (Intr)
T 0

Standardized Within-Group Residuals:
         Min          Q1         Med          Q3         Max
-1.76880069 -0.62080393 -0.02900206  0.78562378  1.43929227

Number of Observations: 16
Number of Groups: 4
 > fit4f<- lm(y ~ T + B)
 > summary(fit4f)

Call:
lm(formula = y ~ T + B)

Residuals:
      Min       1Q   Median       3Q      Max
-1.46741 -0.50294 -0.03867  0.66197  1.25562

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  -2.3221     0.4244  -5.472 0.000194 ***
T             0.6917     0.2122   3.260 0.007604 **
B2            3.5997     0.6002   5.998 8.96e-05 ***
B3           -1.4594     0.6002  -2.432 0.033319 *
B4            9.2889     0.6002  15.477 8.20e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8488 on 11 degrees of freedom
Multiple R-squared: 0.9727,     Adjusted R-squared: 0.9628
F-statistic: 98.03 on 4 and 11 DF,  p-value: 1.587e-08

 > d<- coef(fit4f)[3:5]  #block coeffs
 > cB<- c(-0.25*d[1]-0.25*d[2]-0.25*d[3],
+   0.75*d[1]-0.25*d[2]-0.25*d[3],
+   -0.25*d[1]+0.75*d[2]-0.25*d[3],
+   -0.25*d[1]-0.25*d[2]+0.75*d[3]) #convert to mean = 0 basis
 > sd(cB)  #estimate random effect
[1] 4.785915

Note that the two approaches give essentially identical estimates.
At 02:11 PM 4/23/2010, Ben Bolker wrote:
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
#
It appears that the real problem here is not with the data, but with the
estimation procedure.  When fitting the model by maximum likelihood both
lme and AD Model builder give an estimate of about 3.8 (true value 4)
for the std dev of the RE's.  When fitting the data by reml
AD Model builder gets an estimate of about 4.4.  I think that the
reml estimation is more numerically challenging than ml and that lmer's
algorithm is just not up to the job. You can see the ADMB code etc.
at

         http://groups.google.com/group/admb-users/t/83d371e04bbe48c9
#
I think the point is that lme only addresses the simplest case of a
linear mixed model and that by such specialization it is more effective
in this case. lmer attempts  to handle slightly more complex mixed
models and must use a different algorithm and this algorithm
seems to fail. AD Model Builder on the other hand uses
a much more general algorithm intended to work with arbitrary nonlinear
mixed models. To stick with the current simple example suppose that you
want to have the regression be robust such as assuming that the random
variable e has a fat-tailed student's distribution. This can easily be
done in ADMB, but not in lme, I believe.
#
I am surprised at suggestions that the problem of random effects with very few levels can be overcome by changing modelling framework or estimation algorithms.

It seems to me that the problem is a basic one of lack of information. If you want to estimate the variance of something you really need more than 3 observations. No amount of fiddling with the model or algorithms will change that. The only thing that might change it is to use Bayesian methods with informative priors, but then you are changing the problem by taking some information from elsewhere.

I reach the same conclusion if I think about the 'scope of inference'. I cannot expect to say much about a population I have just 3 observations from.

Statistical finesse cannot compensate for a basic lack of information. 


Richard Coe
Principal Scientist ? Research Methods
World Agroforestry Centre (ICRAF), Nairobi

and

Statistical Services Centre, U Reading.


-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel Charpentier
Sent: Saturday, April 24, 2010 12:57 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Random vs. fixed effects

Two cents from an humble practitioner :

Le vendredi 23 avril 2010 ? 14:11 -0400, Ben Bolker a ?crit :
Schmilosophically speaking, option #3 has a set of interesting
features :

- It entails the creation and fitting of a full joint probability model.
Internal consistency is guranteed.

- Bayesian modeling tools (especially BUGS) offer enough flexibility to
*require* explicit *choices* for model assumption. For exmple, the
"standard" choice of a normal distribution for level II effects has to
be *explcitely writtern by the modeler, which gives him/her an
opportunity to consider and/or justify his/her choice. This is also true
of the choice of the modelization of nuisance parameters.

- There is no "formal" distinction between fixed and random effects ;
the latter are given a distribution (to be fitted), whereas the formere
are not.

However : 

- getting such a model to converge can be hard (especilly with BUGS).
Current tools need a real (maybe too much) understanding of the
numerical analysis problems to choose an *efficient* model expression,
and some "tricks" used to get convergence (such as rendering a model
deliberately unidentifible...) are difficult to justify rom a modeling
standpoint. Again, Gelman & Hill comparison of the current Bayesian
tools to early-generation regression software seems quite apt.

- The choices made about shape of distributions, relationships, etc ...
should be submitted to a sensitivity analysis, which raises again the
statistician's workload.

In short, option #3 is the *expensive* way of (maybe) getting the most
precise honest answer to your problem... if such thing exists and is
reachble with current numerical tools.

Murphy's law ensures that option #2 will "break" (and warnngs will be
ignored) especilly when warnings mean something important.

But option #1 is *also* an acceptable one : using it entails modeling
*conditionally on your experimental setup*. You won't be able to state
(directly) an estimation of the possible "population" effects of the
formerly-random effects, and the validity of your inferences about the
really-fixed effects will be formerly limited to the particular
population defined by these formerly-random effects. But that might be
okay if you have other information allowing you (and your reader) to
independently assess the possible value of an extrapolation from this
particulr subpopulation to an hypothetical population.

In short, option #1 is a cheap wy of getting a ossibly ccetable
approximate solution to your problem, whose value has to be assessed by
other means.

Of course, you won't benefit from artial pooling, and serious inter-unit
interactions will mess your inferences ...

HTH,

					Emmanuel Charpentier
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
 

__________ Information from ESET NOD32 Antivirus, version of virus signature database 5055 (20100424) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__________ Information from ESET NOD32 Antivirus, version of virus signature database 5055 (20100424) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
#
Here it is redone with lme.  lme seems not to exhibit the numerical
problems for 4 levels that we saw with lmer.
+   set.seed(1)
+   x <- 1:n
+   fac <- gl(k, 1, n)
+   fac.eff <- rnorm(k, 0, 4)[fac]
+   e <- rnorm(n)
+   y <- 1 + 2 * x + fac.eff + e
+   lme(y ~ x, random = ~ 1 | fac)
+ }
Linear mixed-effects model fit by REML
  Data: NULL
  Log-restricted-likelihood: -14342.06
  Fixed: y ~ x
(Intercept)           x
   1.313495    1.999999

Random effects:
 Formula: ~1 | fac
        (Intercept) Residual
StdDev:    4.421380 1.012295

Number of Observations: 10000
Number of Groups: 4
Linear mixed-effects model fit by REML
  Data: NULL
  Log-restricted-likelihood: -14515.87
  Fixed: y ~ x
(Intercept)           x
   1.396288    2.000000

Random effects:
 Formula: ~1 | fac
        (Intercept) Residual
StdDev:    3.322084  1.01249

Number of Observations: 10000
Number of Groups: 50



On Fri, Apr 23, 2010 at 12:41 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
#
Just one other comment so that we are not unfair to lmer.  I was using
the CRAN version of lmer.  Using the development version in package
lme4a there are no numerical problems for the 4 level case. Thus it
appears that lme, lme4a and ADMB all handle this case.
Loading required package: Matrix
Loading required package: lattice

Attaching package: 'Matrix'


        The following object(s) are masked from package:base :

         det
+   set.seed(1)
+   x <- 1:n
+   fac <- gl(k, 1, n)
+   fac.eff <- rnorm(k, 0, 4)[fac]
+   e <- rnorm(n)
+   y <- 1 + 2 * x + fac.eff + e
+   # lme(y ~ x, random = ~ 1 | fac)
+   lmer(y ~ x + (1 | fac))
+ }
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
 REML
28684

Random effects:
 Groups   Name        Variance Std.Dev.
 fac      (Intercept) 19.5486  4.4214
 Residual              1.0247  1.0123
Number of obs: 10000, groups: fac, 4

Fixed effects:
             Estimate Std. Error t value
(Intercept) 1.313e+00  2.211e+00       1
x           2.000e+00  3.507e-06  570338

Correlation of Fixed Effects:
  (Intr)
x -0.008
[1] "0.999375-45"


On Fri, Apr 23, 2010 at 12:41 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote: