Random vs. fixed effects
I am surprised at suggestions that the problem of random effects with very few levels can be overcome by changing modelling framework or estimation algorithms. It seems to me that the problem is a basic one of lack of information. If you want to estimate the variance of something you really need more than 3 observations. No amount of fiddling with the model or algorithms will change that. The only thing that might change it is to use Bayesian methods with informative priors, but then you are changing the problem by taking some information from elsewhere. I reach the same conclusion if I think about the 'scope of inference'. I cannot expect to say much about a population I have just 3 observations from. Statistical finesse cannot compensate for a basic lack of information. Richard Coe Principal Scientist ? Research Methods World Agroforestry Centre (ICRAF), Nairobi and Statistical Services Centre, U Reading. -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel Charpentier Sent: Saturday, April 24, 2010 12:57 AM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Random vs. fixed effects Two cents from an humble practitioner : Le vendredi 23 avril 2010 ? 14:11 -0400, Ben Bolker a ?crit :
Here's my question for the group: Given that it is a reasonable *philosophical* position to say 'treat philosophically random effects as random no matter what, and leave them in the model even if they don't appear to be statistically significant', and given that with small numbers of random-effect levels this approach is likely to lead to numerical difficulties in most (??) mixed model packages (warnings, errors, or low estimates of the variance), what should one do? (Suppose one is in a situation that is too complicated to use classical method-of-moments approaches -- crossed designs, highly unbalanced data, GLMMs ...) 1. philosophy, schmilosophy. Fit these factors as a fixed effect, anything else is too dangerous/misleading/unworkable. 2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED, ...) and hope it doesn't break. Ignore warnings. 3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model Builder with post-hoc MCMC calculation? Data cloning?)? Possibly with half-Cauchy priors on variance as recommended by Gelman [Bayesian Analysis (2006) 1, Number 3, pp. 515?533]?
Schmilosophically speaking, option #3 has a set of interesting features : - It entails the creation and fitting of a full joint probability model. Internal consistency is guranteed. - Bayesian modeling tools (especially BUGS) offer enough flexibility to *require* explicit *choices* for model assumption. For exmple, the "standard" choice of a normal distribution for level II effects has to be *explcitely writtern by the modeler, which gives him/her an opportunity to consider and/or justify his/her choice. This is also true of the choice of the modelization of nuisance parameters. - There is no "formal" distinction between fixed and random effects ; the latter are given a distribution (to be fitted), whereas the formere are not. However : - getting such a model to converge can be hard (especilly with BUGS). Current tools need a real (maybe too much) understanding of the numerical analysis problems to choose an *efficient* model expression, and some "tricks" used to get convergence (such as rendering a model deliberately unidentifible...) are difficult to justify rom a modeling standpoint. Again, Gelman & Hill comparison of the current Bayesian tools to early-generation regression software seems quite apt. - The choices made about shape of distributions, relationships, etc ... should be submitted to a sensitivity analysis, which raises again the statistician's workload. In short, option #3 is the *expensive* way of (maybe) getting the most precise honest answer to your problem... if such thing exists and is reachble with current numerical tools. Murphy's law ensures that option #2 will "break" (and warnngs will be ignored) especilly when warnings mean something important. But option #1 is *also* an acceptable one : using it entails modeling *conditionally on your experimental setup*. You won't be able to state (directly) an estimation of the possible "population" effects of the formerly-random effects, and the validity of your inferences about the really-fixed effects will be formerly limited to the particular population defined by these formerly-random effects. But that might be okay if you have other information allowing you (and your reader) to independently assess the possible value of an extrapolation from this particulr subpopulation to an hypothetical population. In short, option #1 is a cheap wy of getting a ossibly ccetable approximate solution to your problem, whose value has to be assessed by other means. Of course, you won't benefit from artial pooling, and serious inter-unit interactions will mess your inferences ... HTH, Emmanuel Charpentier
Gabor Grothendieck wrote:
Here is a simulation of 10k cases using 4 and 50 level factors for the random effect. With 4 levels there are numerical problems and the accuracy of the random effect is terrible. With 50 levels there are no numerical problems and the accuracy is much better.
library(lme4)
set.seed(1)
n <- 10000
k <- 4
f <- function(n, k) {
+ set.seed(1) + x <- 1:n + fac <- gl(k, 1, n) + fac.eff <- rnorm(k, 0, 4)[fac] + e <- rnorm(n) + y <- 1 + 2 * x + fac.eff + e + lmer(y ~ x + (1|fac)) + }
# simulation with 4 level random effect f(n, 4)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
28733 28762 -14363 28702 28725
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 1.1162 1.0565
Residual 1.0298 1.0148
Number of obs: 10000, groups: fac, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.313e+00 5.286e-01 2
x 2.000e+00 3.515e-06 568923
Correlation of Fixed Effects:
(Intr)
x -0.033
Warning message:
In mer_finalize(ans) : false convergence (8)
# simulation with 50 level random effect f(n, 50)
Linear mixed model fit by REML
Formula: y ~ x + (1 | fac)
AIC BIC logLik deviance REMLdev
29040 29069 -14516 29009 29032
Random effects:
Groups Name Variance Std.Dev.
fac (Intercept) 11.2016 3.3469
Residual 1.0251 1.0125
Number of obs: 10000, groups: fac, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.396e+00 4.738e-01 3
x 2.000e+00 3.507e-06 570242
Correlation of Fixed Effects:
(Intr)
x -0.037
On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
I just read a post by Andrew Dolman suggesting that a factor with only 3
levels should be treated as a fixed effect. This seems to be a perennial
question with mixed models. I'd really like to hear opinions from
several experts as to whether there is a consensus on the topic. It
really makes me uncomfortable that such an important modeling decision
is made with an "ad hoc" heuristic.
Thanks,
Mark Schultz, Ph.D.
Bedford VA Hospital
Bedford, Ma.
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models __________ Information from ESET NOD32 Antivirus, version of virus signature database 5055 (20100424) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5055 (20100424) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com