An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20100423/5c256d07/attachment.pl>
Random vs. fixed effects
15 messages · Schultz, Mark R., Daniel Ezra Johnson, Liaw, Andy +8 more
I'm not personally an expert, but from reading this list for some time, the consensus is that three points is not enough to make a reasonable estimate of a variance. Nor is a three-level effect likely to be nested within a fixed effect, which would require it be treated as random. So yes, a fixed effect sounds like the way to go here, but as far as what differences you'd get by treating it as fixed or random, in terms of inference or prediction (from the BLUPs, if random) you might want to investigate that. If you want to make predictions to new population members, then of course random is your only choice. Dan
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3 levels should be treated as a fixed effect. This seems to be a perennial question with mixed models. I'd really like to hear opinions from several experts as to whether there is a consensus on the topic. It really makes me uncomfortable that such an important modeling decision is made with an "ad hoc" heuristic. Thanks, Mark Schultz, Ph.D. Bedford VA Hospital Bedford, Ma. ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I'm by no means expert, but it seems to me that this is more a philosophical question than a technical one. To me, a factor is treated a fixed effect if the interest is in the differences from one level to another (or some contrasts). A random factor, on the other hand, is when the interest is in the variability due to the factor, and the levels of the factor can be considered as a sample from a (Gaussian) population. The problem is, if a factor has only three levels, can we really reliably estimate the variance of the population from which the three levels of the factor were drawn from? Well, if you must, you must. However, it seems to me that if the factor is really a blocking variable (thus basically nuisance parameters), one can go either way. I'd very much welcome the real experts' corrections or comments. Andy
-----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Schultz, Mark R. Sent: Friday, April 23, 2010 9:38 AM To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Random vs. fixed effects I just read a post by Andrew Dolman suggesting that a factor with only 3 levels should be treated as a fixed effect. This seems to be a perennial question with mixed models. I'd really like to hear opinions from several experts as to whether there is a consensus on the topic. It really makes me uncomfortable that such an important modeling decision is made with an "ad hoc" heuristic. Thanks, Mark Schultz, Ph.D. Bedford VA Hospital Bedford, Ma. [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Notice: This e-mail message, together with any attachme...{{dropped:10}}
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. With 4 levels there are numerical problems and the accuracy of the random effect is terrible. With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
28733 28762 -14363 28702 28725
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 1.1162 1.0565
Residual 1.0298 1.0148
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 5.286e-01 2
x 2.000e+00 3.515e-06 568923
Correlation of Fixed Effects:
(Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
29040 29069 -14516 29009 29032
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 11.2016 3.3469
Residual 1.0251 1.0125
Number of obs: 10000, groups: fac, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.396e+00 4.738e-01 3
x 2.000e+00 3.507e-06 570242
Correlation of Fixed Effects:
(Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3 levels should be treated as a fixed effect. This seems to be a perennial question with mixed models. I'd really like to hear opinions from several experts as to whether there is a consensus on the topic. It really makes me uncomfortable that such an important modeling decision is made with an "ad hoc" heuristic. Thanks, Mark Schultz, Ph.D. Bedford VA Hospital Bedford, Ma. ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? (Suppose one is in a situation that is too complicated to use classical method-of-moments approaches -- crossed designs, highly unbalanced data, GLMMs ...) 1. philosophy, schmilosophy. Fit these factors as a fixed effect, anything else is too dangerous/misleading/unworkable. 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED, ...) and hope it doesn't break. Ignore warnings. 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model Builder with post-hoc MCMC calculation? Data cloning?)? Possibly with half-Cauchy priors on variance as recommended by Gelman [Bayesian Analysis (2006) 1, Number 3, pp. 515?533]?
Gabor Grothendieck wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. With 4 levels there are numerical problems and the accuracy of the random effect is terrible. With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
28733 28762 -14363 28702 28725
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 1.1162 1.0565
Residual 1.0298 1.0148
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 5.286e-01 2
x 2.000e+00 3.515e-06 568923
Correlation of Fixed Effects:
(Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
29040 29069 -14516 29009 29032
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 11.2016 3.3469
Residual 1.0251 1.0125
Number of obs: 10000, groups: fac, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.396e+00 4.738e-01 3
x 2.000e+00 3.507e-06 570242
Correlation of Fixed Effects:
(Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3
levels should be treated as a fixed effect. This seems to be a perennial
question with mixed models. I'd really like to hear opinions from
several experts as to whether there is a consensus on the topic. It
really makes me uncomfortable that such an important modeling decision
is made with an "ad hoc" heuristic.
Thanks,
Mark Schultz, Ph.D.
Bedford VA Hospital
Bedford, Ma.
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
The answer is effect-size dependent, is it not? If you fit the random effect and the algorithm works without failure, why not use it? If it doesn't work, you have a faulty tool for estimation. Punting to a fixed model is one way out of the problem. Another is matched-on-the-random-factor data analysis. Pragmatism is certainly an issue. But what if you have 10 centers as a factor with known correlation issues. If you analyze with one set of predictors, missing values leaves you with only 5 centers, so you treat centers as a fixed effect with 5 levels. If you use another set of predictors, you have all 10 levels, so you use centers as a random effect with a variance. Isn't intellectual consistency an issue here too? How do you explain this in the executive summary? One thing you can do if the mixed modeling fails is to use the standard deviation among levels of the random-treated-as-fixed factor as an estimate of the random effect. This would at least maintain consistency of concept. Note that I'm not a mixed modeling expert, so my opinions may not be worth much.
At 02:11 PM 4/23/2010, Ben Bolker wrote:
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? (Suppose one is in a situation that is too complicated to use classical method-of-moments approaches -- crossed designs, highly unbalanced data, GLMMs ...) 1. philosophy, schmilosophy. Fit these factors as a fixed effect, anything else is too dangerous/misleading/unworkable. 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED, ...) and hope it doesn't break. Ignore warnings. 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model Builder with post-hoc MCMC calculation? Data cloning?)? Possibly with half-Cauchy priors on variance as recommended by Gelman [Bayesian Analysis (2006) 1, Number 3, pp. 515?533]?
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================
Two cents from an humble practitioner : Le vendredi 23 avril 2010 ? 14:11 -0400, Ben Bolker a ?crit :
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? (Suppose one is in a situation that is too complicated to use classical method-of-moments approaches -- crossed designs, highly unbalanced data, GLMMs ...) 1. philosophy, schmilosophy. Fit these factors as a fixed effect, anything else is too dangerous/misleading/unworkable. 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED, ...) and hope it doesn't break. Ignore warnings. 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model Builder with post-hoc MCMC calculation? Data cloning?)? Possibly with half-Cauchy priors on variance as recommended by Gelman [Bayesian Analysis (2006) 1, Number 3, pp. 515?533]?
Schmilosophically speaking, option #3 has a set of interesting features : - It entails the creation and fitting of a full joint probability model. Internal consistency is guranteed. - Bayesian modeling tools (especially BUGS) offer enough flexibility to *require* explicit *choices* for model assumption. For exmple, the "standard" choice of a normal distribution for level II effects has to be *explcitely writtern by the modeler, which gives him/her an opportunity to consider and/or justify his/her choice. This is also true of the choice of the modelization of nuisance parameters. - There is no "formal" distinction between fixed and random effects ; the latter are given a distribution (to be fitted), whereas the formere are not. However : - getting such a model to converge can be hard (especilly with BUGS). Current tools need a real (maybe too much) understanding of the numerical analysis problems to choose an *efficient* model expression, and some "tricks" used to get convergence (such as rendering a model deliberately unidentifible...) are difficult to justify rom a modeling standpoint. Again, Gelman & Hill comparison of the current Bayesian tools to early-generation regression software seems quite apt. - The choices made about shape of distributions, relationships, etc ... should be submitted to a sensitivity analysis, which raises again the statistician's workload. In short, option #3 is the *expensive* way of (maybe) getting the most precise honest answer to your problem... if such thing exists and is reachble with current numerical tools. Murphy's law ensures that option #2 will "break" (and warnngs will be ignored) especilly when warnings mean something important. But option #1 is *also* an acceptable one : using it entails modeling *conditionally on your experimental setup*. You won't be able to state (directly) an estimation of the possible "population" effects of the formerly-random effects, and the validity of your inferences about the really-fixed effects will be formerly limited to the particular population defined by these formerly-random effects. But that might be okay if you have other information allowing you (and your reader) to independently assess the possible value of an extrapolation from this particulr subpopulation to an hypothetical population. In short, option #1 is a cheap wy of getting a ossibly ccetable approximate solution to your problem, whose value has to be assessed by other means. Of course, you won't benefit from artial pooling, and serious inter-unit interactions will mess your inferences ... HTH, Emmanuel Charpentier
Gabor Grothendieck wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. With 4 levels there are numerical problems and the accuracy of the random effect is terrible. With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
28733 28762 -14363 28702 28725
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 1.1162 1.0565
Residual 1.0298 1.0148
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 5.286e-01 2
x 2.000e+00 3.515e-06 568923
Correlation of Fixed Effects:
(Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
29040 29069 -14516 29009 29032
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 11.2016 3.3469
Residual 1.0251 1.0125
Number of obs: 10000, groups: fac, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.396e+00 4.738e-01 3
x 2.000e+00 3.507e-06 570242
Correlation of Fixed Effects:
(Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3
levels should be treated as a fixed effect. This seems to be a perennial
question with mixed models. I'd really like to hear opinions from
several experts as to whether there is a consensus on the topic. It
really makes me uncomfortable that such an important modeling decision
is made with an "ad hoc" heuristic.
Thanks,
Mark Schultz, Ph.D.
Bedford VA Hospital
Bedford, Ma.
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
For what its worth, I would take approach 3. Others have pointed this out before, but I think strict adherence to the 'random' vs 'fixed effects' nomenclature can sometimes do us a disservice. For me I find it easier to think of my multilevel models as having some parameters I allow to vary and some that I don't, or, some parameters are themselves modelled, some aren't. Bayesianly, I think of the level above my modelled parameter as a prior. This prior can be relatively informative or uninformative, when the number of groups contributing to the modelled parameter is low then this prior is likely to be relatively uninformative. But nonetheless probably still useful. Yes, the estimated posterior mode for the variance is likely to be underestimated but this is only a problem if I ignore the rest of its distribution. The uncertainty around the estimate of the variance is likely to be very large and in fact will allow for unrealistically large values. This assumes I have specified a flat prior on the variance, but it can be ameliorated if I instead apply an appropriate half-Cauchy. In most cases I would treat grouping variables with more than two levels as a single parameter allowed to vary by group, either as a component of an intercept term or as an error term centred on zero. Then, if the data allowed, I might consider allowing other parameters in my model to vary by these groups and therefore take care of the interactions. Arguably even categorical variables that some would consider inarguably philosophically 'fixed', can be incorporated as a modelled (allowed to vary by group) parameter. Gelman points out that such an approach can negate the 'classical' problem of multiple comparisons. See http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf . Will Morris Masters of Philosophy candidate Vesk Plant Ecology Lab The School of Botany The University of Melbourne Australia Phone: +61 3 8344 0120 http://www.botany.unimelb.edu.au/vesk/
On 24/04/2010, at 4:11 AM, Ben Bolker wrote:
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? (Suppose one is in a situation that is too complicated to use classical method-of-moments approaches -- crossed designs, highly unbalanced data, GLMMs ...) 1. philosophy, schmilosophy. Fit these factors as a fixed effect, anything else is too dangerous/misleading/unworkable. 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED, ...) and hope it doesn't break. Ignore warnings. 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model Builder with post-hoc MCMC calculation? Data cloning?)? Possibly with half-Cauchy priors on variance as recommended by Gelman [Bayesian Analysis (2006) 1, Number 3, pp. 515?533]? Gabor Grothendieck wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. With 4 levels there are numerical problems and the accuracy of the random effect is terrible. With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
28733 28762 -14363 28702 28725
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 1.1162 1.0565
Residual 1.0298 1.0148
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 5.286e-01 2
x 2.000e+00 3.515e-06 568923
Correlation of Fixed Effects:
(Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
29040 29069 -14516 29009 29032
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 11.2016 3.3469
Residual 1.0251 1.0125
Number of obs: 10000, groups: fac, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.396e+00 4.738e-01 3
x 2.000e+00 3.507e-06 570242
Correlation of Fixed Effects:
(Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3
levels should be treated as a fixed effect. This seems to be a perennial
question with mixed models. I'd really like to hear opinions from
several experts as to whether there is a consensus on the topic. It
really makes me uncomfortable that such an important modeling decision
is made with an "ad hoc" heuristic.
Thanks,
Mark Schultz, Ph.D.
Bedford VA Hospital
Bedford, Ma.
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
As an addendum, back in the "Old Days", before there were good mixed
model programs and all you could fit were fixed effect models, you
had to estimate random effects by fitting the fixed effect model and
then taking mean squares of the random effect levels as the estimate
of the variance.
As an example, consider a simple complete randomized block design:
> require('nlme')
> set.seed(1)
> nr<- 2 #reps per case
> nB<- 4 #blocks
> nT<- 2 #treatments
> n<- nT*nB*nr #total data
> T<- rep(c(-1,1),nB*nr)
> B<- factor(c(rep(1,nr*nT),rep(2,nr*nT),rep(3,nr*nT),rep(4,nr*nT)))
> rB<- rnorm(nB,0,4)[B]
> e<- rnorm(n,0,1) #residuals
> y<- T + rB + e
> fit4m<- lme(y ~ T, random=~1|B)
> summary(fit4m)
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
63.22191 65.77814 -27.61096
Random effects:
Formula: ~1 | B
(Intercept) Residual
StdDev: 4.767061 0.8488003
Fixed effects: y ~ T
Value Std.Error DF t-value p-value
(Intercept) 0.5351939 2.3929577 11 0.223654 0.8271
T 0.6916997 0.2122001 11 3.259658 0.0076
Correlation:
(Intr)
T 0
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.76880069 -0.62080393 -0.02900206 0.78562378 1.43929227
Number of Observations: 16
Number of Groups: 4
> fit4f<- lm(y ~ T + B)
> summary(fit4f)
Call:
lm(formula = y ~ T + B)
Residuals:
Min 1Q Median 3Q Max
-1.46741 -0.50294 -0.03867 0.66197 1.25562
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.3221 0.4244 -5.472 0.000194 ***
T 0.6917 0.2122 3.260 0.007604 **
B2 3.5997 0.6002 5.998 8.96e-05 ***
B3 -1.4594 0.6002 -2.432 0.033319 *
B4 9.2889 0.6002 15.477 8.20e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8488 on 11 degrees of freedom
Multiple R-squared: 0.9727, Adjusted R-squared: 0.9628
F-statistic: 98.03 on 4 and 11 DF, p-value: 1.587e-08
> d<- coef(fit4f)[3:5] #block coeffs
> cB<- c(-0.25*d[1]-0.25*d[2]-0.25*d[3],
+ 0.75*d[1]-0.25*d[2]-0.25*d[3],
+ -0.25*d[1]+0.75*d[2]-0.25*d[3],
+ -0.25*d[1]-0.25*d[2]+0.75*d[3]) #convert to mean = 0 basis
> sd(cB) #estimate random effect
[1] 4.785915
Note that the two approaches give essentially identical estimates.
At 02:11 PM 4/23/2010, Ben Bolker wrote:
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? <snip>
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
It appears that the real problem here is not with the data, but with the
estimation procedure. When fitting the model by maximum likelihood both
lme and AD Model builder give an estimate of about 3.8 (true value 4)
for the std dev of the RE's. When fitting the data by reml
AD Model builder gets an estimate of about 4.4. I think that the
reml estimation is more numerically challenging than ml and that lmer's
algorithm is just not up to the job. You can see the ADMB code etc.
at
http://groups.google.com/group/admb-users/t/83d371e04bbe48c9
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20100424/f2cd2b68/attachment.pl>
I think the point is that lme only addresses the simplest case of a linear mixed model and that by such specialization it is more effective in this case. lmer attempts to handle slightly more complex mixed models and must use a different algorithm and this algorithm seems to fail. AD Model Builder on the other hand uses a much more general algorithm intended to work with arbitrary nonlinear mixed models. To stick with the current simple example suppose that you want to have the regression be robust such as assuming that the random variable e has a fat-tailed student's distribution. This can easily be done in ADMB, but not in lme, I believe.
I am surprised at suggestions that the problem of random effects with very few levels can be overcome by changing modelling framework or estimation algorithms. It seems to me that the problem is a basic one of lack of information. If you want to estimate the variance of something you really need more than 3 observations. No amount of fiddling with the model or algorithms will change that. The only thing that might change it is to use Bayesian methods with informative priors, but then you are changing the problem by taking some information from elsewhere. I reach the same conclusion if I think about the 'scope of inference'. I cannot expect to say much about a population I have just 3 observations from. Statistical finesse cannot compensate for a basic lack of information. Richard Coe Principal Scientist ? Research Methods World Agroforestry Centre (ICRAF), Nairobi and Statistical Services Centre, U Reading. -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel Charpentier Sent: Saturday, April 24, 2010 12:57 AM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Random vs. fixed effects Two cents from an humble practitioner : Le vendredi 23 avril 2010 ? 14:11 -0400, Ben Bolker a ?crit :
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? (Suppose one is in a situation that is too complicated to use classical method-of-moments approaches -- crossed designs, highly unbalanced data, GLMMs ...) 1. philosophy, schmilosophy. Fit these factors as a fixed effect, anything else is too dangerous/misleading/unworkable. 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED, ...) and hope it doesn't break. Ignore warnings. 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model Builder with post-hoc MCMC calculation? Data cloning?)? Possibly with half-Cauchy priors on variance as recommended by Gelman [Bayesian Analysis (2006) 1, Number 3, pp. 515?533]?
Schmilosophically speaking, option #3 has a set of interesting features : - It entails the creation and fitting of a full joint probability model. Internal consistency is guranteed. - Bayesian modeling tools (especially BUGS) offer enough flexibility to *require* explicit *choices* for model assumption. For exmple, the "standard" choice of a normal distribution for level II effects has to be *explcitely writtern by the modeler, which gives him/her an opportunity to consider and/or justify his/her choice. This is also true of the choice of the modelization of nuisance parameters. - There is no "formal" distinction between fixed and random effects ; the latter are given a distribution (to be fitted), whereas the formere are not. However : - getting such a model to converge can be hard (especilly with BUGS). Current tools need a real (maybe too much) understanding of the numerical analysis problems to choose an *efficient* model expression, and some "tricks" used to get convergence (such as rendering a model deliberately unidentifible...) are difficult to justify rom a modeling standpoint. Again, Gelman & Hill comparison of the current Bayesian tools to early-generation regression software seems quite apt. - The choices made about shape of distributions, relationships, etc ... should be submitted to a sensitivity analysis, which raises again the statistician's workload. In short, option #3 is the *expensive* way of (maybe) getting the most precise honest answer to your problem... if such thing exists and is reachble with current numerical tools. Murphy's law ensures that option #2 will "break" (and warnngs will be ignored) especilly when warnings mean something important. But option #1 is *also* an acceptable one : using it entails modeling *conditionally on your experimental setup*. You won't be able to state (directly) an estimation of the possible "population" effects of the formerly-random effects, and the validity of your inferences about the really-fixed effects will be formerly limited to the particular population defined by these formerly-random effects. But that might be okay if you have other information allowing you (and your reader) to independently assess the possible value of an extrapolation from this particulr subpopulation to an hypothetical population. In short, option #1 is a cheap wy of getting a ossibly ccetable approximate solution to your problem, whose value has to be assessed by other means. Of course, you won't benefit from artial pooling, and serious inter-unit interactions will mess your inferences ... HTH, Emmanuel Charpentier
Gabor Grothendieck wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. With 4 levels there are numerical problems and the accuracy of the random effect is terrible. With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
28733 28762 -14363 28702 28725
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 1.1162 1.0565
Residual 1.0298 1.0148
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 5.286e-01 2
x 2.000e+00 3.515e-06 568923
Correlation of Fixed Effects:
(Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
29040 29069 -14516 29009 29032
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 11.2016 3.3469
Residual 1.0251 1.0125
Number of obs: 10000, groups: fac, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.396e+00 4.738e-01 3
x 2.000e+00 3.507e-06 570242
Correlation of Fixed Effects:
(Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3
levels should be treated as a fixed effect. This seems to be a perennial
question with mixed models. I'd really like to hear opinions from
several experts as to whether there is a consensus on the topic. It
really makes me uncomfortable that such an important modeling decision
is made with an "ad hoc" heuristic.
Thanks,
Mark Schultz, Ph.D.
Bedford VA Hospital
Bedford, Ma.
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models __________ Information from ESET NOD32 Antivirus, version of virus signature database 5055 (20100424) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5055 (20100424) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Here it is redone with lme. lme seems not to exhibit the numerical problems for 4 levels that we saw with lmer.
library(nlme)
set.seed(1)
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lme(y ~ x, random = ~ 1 | fac) + }
n <- 10000 f(n, 4) # 4 levels
Linear mixed-effects model fit by REML
Data: NULL
Log-restricted-likelihood: -14342.06
Fixed: y ~ x
(Intercept) x
1.313495 1.999999
Random effects:
Formula: ~1 | fac
(Intercept) Residual
StdDev: 4.421380 1.012295
Number of Observations: 10000
Number of Groups: 4
###################### f(n, 50) # 50 levels
Linear mixed-effects model fit by REML
Data: NULL
Log-restricted-likelihood: -14515.87
Fixed: y ~ x
(Intercept) x
1.396288 2.000000
Random effects:
Formula: ~1 | fac
(Intercept) Residual
StdDev: 3.322084 1.01249
Number of Observations: 10000
Number of Groups: 50
On Fri, Apr 23, 2010 at 12:41 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. ?With 4 levels there are numerical problems and the accuracy of the random effect is terrible. ?With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML Formula: y ~ x + (1 | fac) ? AIC ? BIC logLik deviance REMLdev ?28733 28762 -14363 ? ?28702 ? 28725 Random effects: ?Groups ? Name ? ? ? ?Variance Std.Dev. ?fac ? ? ?(Intercept) 1.1162 ? 1.0565 ?Residual ? ? ? ? ? ? 1.0298 ? 1.0148 Number of obs: 10000, groups: fac, 4 Fixed effects: ? ? ? ? ? ? Estimate Std. Error t value (Intercept) 1.313e+00 ?5.286e-01 ? ? ? 2 x ? ? ? ? ? 2.000e+00 ?3.515e-06 ?568923 Correlation of Fixed Effects: ?(Intr) x -0.033 Warning message: In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML Formula: y ~ x + (1 | fac) ? AIC ? BIC logLik deviance REMLdev ?29040 29069 -14516 ? ?29009 ? 29032 Random effects: ?Groups ? Name ? ? ? ?Variance Std.Dev. ?fac ? ? ?(Intercept) 11.2016 ?3.3469 ?Residual ? ? ? ? ? ? ?1.0251 ?1.0125 Number of obs: 10000, groups: fac, 50 Fixed effects: ? ? ? ? ? ? Estimate Std. Error t value (Intercept) 1.396e+00 ?4.738e-01 ? ? ? 3 x ? ? ? ? ? 2.000e+00 ?3.507e-06 ?570242 Correlation of Fixed Effects: ?(Intr) x -0.037 On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3 levels should be treated as a fixed effect. This seems to be a perennial question with mixed models. I'd really like to hear opinions from several experts as to whether there is a consensus on the topic. It really makes me uncomfortable that such an important modeling decision is made with an "ad hoc" heuristic. Thanks, Mark Schultz, Ph.D. Bedford VA Hospital Bedford, Ma. ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Just one other comment so that we are not unfair to lmer. I was using the CRAN version of lmer. Using the development version in package lme4a there are no numerical problems for the 4 level case. Thus it appears that lme, lme4a and ADMB all handle this case.
library(lme4a)
Loading required package: Matrix
Loading required package: lattice
Attaching package: 'Matrix'
The following object(s) are masked from package:base :
det
set.seed(1)
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + # lme(y ~ x, random = ~ 1 | fac) + lmer(y ~ x + (1 | fac)) + }
n <- 10000 f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
REML
28684
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 19.5486 4.4214
Residual 1.0247 1.0123
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 2.211e+00 1
x 2.000e+00 3.507e-06 570338
Correlation of Fixed Effects:
(Intr)
x -0.008
packageDescription("lme4a")$Version
[1] "0.999375-45" On Fri, Apr 23, 2010 at 12:41 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. ?With 4 levels there are numerical problems and the accuracy of the random effect is terrible. ?With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML Formula: y ~ x + (1 | fac) ? AIC ? BIC logLik deviance REMLdev ?28733 28762 -14363 ? ?28702 ? 28725 Random effects: ?Groups ? Name ? ? ? ?Variance Std.Dev. ?fac ? ? ?(Intercept) 1.1162 ? 1.0565 ?Residual ? ? ? ? ? ? 1.0298 ? 1.0148 Number of obs: 10000, groups: fac, 4 Fixed effects: ? ? ? ? ? ? Estimate Std. Error t value (Intercept) 1.313e+00 ?5.286e-01 ? ? ? 2 x ? ? ? ? ? 2.000e+00 ?3.515e-06 ?568923 Correlation of Fixed Effects: ?(Intr) x -0.033 Warning message: In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML Formula: y ~ x + (1 | fac) ? AIC ? BIC logLik deviance REMLdev ?29040 29069 -14516 ? ?29009 ? 29032 Random effects: ?Groups ? Name ? ? ? ?Variance Std.Dev. ?fac ? ? ?(Intercept) 11.2016 ?3.3469 ?Residual ? ? ? ? ? ? ?1.0251 ?1.0125 Number of obs: 10000, groups: fac, 50 Fixed effects: ? ? ? ? ? ? Estimate Std. Error t value (Intercept) 1.396e+00 ?4.738e-01 ? ? ? 3 x ? ? ? ? ? 2.000e+00 ?3.507e-06 ?570242 Correlation of Fixed Effects: ?(Intr) x -0.037 On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3 levels should be treated as a fixed effect. This seems to be a perennial question with mixed models. I'd really like to hear opinions from several experts as to whether there is a consensus on the topic. It really makes me uncomfortable that such an important modeling decision is made with an "ad hoc" heuristic. Thanks, Mark Schultz, Ph.D. Bedford VA Hospital Bedford, Ma. ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models