Distributional assumptions + case studies (was: Random or Fixed effects appropriate?)
On 4/9/08, Douglas Bates <bates at stat.wisc.edu> wrote:
On 4/9/08, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> Hi Reinhold,
> On Wed, Apr 09, 2008 at 05:45:54PM +0200, Reinhold Kliegl wrote:
> > I think this is a reasonable summary.
> > You were not clear on how you plan to use the conditional modes (i.e., > > your point 1). Please keep in mind that conditional modes are not > > independent "observations" like a group mean or within-group effect or > > slope, simply because shrinkage correction uses all data. Also, for > > example, their correlations (i.e., between intercept and x for units > > of C) are typically not identical to the estimated model correlations > > displayed in the random-effects part (see also the Bates quote in my > > last comment).
> > In analyses of reaction times (using subjects and items as crossed > > random factors; carried out with Mike Masson and Eike Richter, 2007), > > model-based estimates of correlations among random effects revealed > > "clearer" patterns than the correlations between means and effects > > computed for each subject (as they should, given that they were > > corrected for unreliability). Unlike for fixed-effects estimates, > > however, estimates of correlations among random effects were quite > > susceptible to violations of distributional assumptions for the > > residuals--up to a change in the sign of the correlation!
> This is a very interesting observation, and one that I suspect should > not be buried in an email. Can you tell us more about it? In my > workshops, I spend a lot of time focusing on the use of diagnostics to > check distributional assumptions. It would be fabulous to be able to > identify a case study in which getting the distributional assumptions > was so clearly important.
> More generally, I wonder if it might be worth collecting such a set of > case studies with clear and thorough analyses and wrapping them in a > document. It seems to me that it would answer the request made by > Iasonas Lamprianou recently.
> I'd be happy to coordinate such an effort, so long as the > contributions were in LaTeX and Sweave. I know my students would > benefit from it :)
> Is there any interest in such an idea, from potential conributors or > (equally importantly) potential users?
I certainly would be delighted to have such a collection made available and would be happy to have it hosted on http://lme4.r-forge.r-project.org/ if that seemed suitable. I would also recommend some of the examples in chapter 7 of Haarald
(Sorry Harald - I got carried away doubling the a's in your name.)
Baayen's new book "Analyzing Linguistic Data: A Practical Introduction to Statistics using R" # Paperback: 368 pages # Publisher: Cambridge University Press; 1 edition (March 17, 2008) # Language: English # ISBN-10: 0521709180 # ISBN-13: 978-0521709187
> > As far as > > the use of conditional modes is concerned, the absolute values of > > correlations between conditional modes were always larger than the > > corresponding model estimates. > > In simulations, the model estimates of correlations recovered the > > "true" variances and correlations, even after random deletion of 50% > > of the data, but the variance of the conditional modes always > > underestimated the true variance and the difference between model > > estimate and correlation based on conditional modes increased with the > > absolute magnitude of the correlation. In other words, conditional > > modes underestimated the variance and exaggerated covariances and > > correlations of random effects in these simulations. The shrinkage in > > variance reflects the contribution of the likelihood in the > > computation of the conditional modes. In summary, according to these > > simulations, the model estimates of correlations among random effects > > are fine; the computed correlations based on conditional modes may > > serve a useful heuristic function for further analyses but must be > > handled with care. > > > > Best > > Reinhold > > > > On Wed, Apr 9, 2008 at 11:21 AM, Nick Isaac <njbisaac at googlemail.com> wrote:
> > > Dear all, > > > > > > Thanks for the comments and apologies for not providing more > > > information. I (mis)judged it would be better to discuss the issue > > > abstractly. There should be enough levels to estimate the variance of > > > C and at least one other random effect: > > > > > > Number of obs: 1242, groups: D, 269; C, 64; B, 8; A, 3 > > > > > > My interpretation of comments by all three respondents is as follows: > > > 1) extracting the random effects/BLUPs/conditional modes is reasonable > > > in general > > > 2) a taxonomy might be considered fixed or random, depending on the > > > question and the number of units/levels > > > 3) In my case, it would be better to use the conditional modes for x|C > > > than to fit x*C as an interaction term. > > > > > > Best wishes, Nick > > > > > > > > > > > > > > > On 08/04/2008, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> > > > On Tue, Apr 08, 2008 at 07:10:16PM +0200, Reinhold Kliegl wrote:
> > > > > > My dataset has one continuous normally-distributed fixed effect and > > > > > > four random effects that are nested (in fact, it is a taxonomy). For > > > > > > simplicity, I've removed the variable names, so the dataset has the > > > > > > following structure: > > > > > > > > > > > > y ~ x | A/B/C/D
> > > > > It would be good to know how many units/levels you have for each of > > > > > your four random effects. Those with fewer than, say, five, are good > > > > > candidates for being specified as fixed effects. Think how many > > > > > observations you need to get a stable estimate of a variance! > > > > >
> > > > > > lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + C + x:C) #error: > > > > > > Downdated X'X is not positive definite, 82
> > > > > You cannot include C both as a random and a fixed effect
> > > > > > > > > > > > > > > > I do not believe that this is generally true. See, for example, > > > >
> > > > > require(lme4) > > > > > (fm1 <- lmer(Reaction ~ Days + Subject + (Days|Subject), sleepstudy))
> > > > > > > > Therefore I am uncertain as to how you can draw this conclusion > > > > without more information about the design (which the poster really > > > > should have provided). > > > > > > > > > > > >
> > > > > > lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + x:C) #gives sensible results
> > > > > If this gives sensible results, I suspect you have very few levels of > > > > > C, say, 2 or 3? > > > > > In this case, definitely specify C and x and their interaction as > > > > > fixed effects, e.g.: > > > > > lmer( y ~ x*C + (1|A) + (1|B) + (1|D) > > > > > > > > > > The following may not apply to your case, but it might: Sometimes > > > > > people think that a nested/taxonomic design implies a random effect > > > > > structure (e.g., schools, classes, students). This is not true. If you > > > > > have only a few units for each factor, you are better off to specify > > > > > it as a fixed-effects rather than a random-effects taxonomy. (Of > > > > > course, you lose generalizability, but if you want this you should > > > > > make sure you have sample that provides a basis for it.)
> > > > > > > > > > > > I can see the sense behind this position but sometimes a few units are > > > > all that is available, and including them in a model as fixed effects > > > > muddies the statistical waters, especially if they are the kinds of > > > > effects that a model user will be unlikely to naturally condition upon. > > > > > > > > I do agree that if there are problems with model fitting and/or > > > > interpretation when the design is rigorously followed, then a more > > > > flexible approach can and should be adopted, and appropriate > > > > allowances must be made. > > > > > > > >
> > > > > The interpretation of conditional modes (formerly knowns as BLUPs, > > > > > that is "predictions") is a tricky business, especially with few > > > > > units per levels.
> > > > > > > > > > > > Sorry, I think I've missed something. In what sense are the > > > > conditional modes formerly known as BLUPs? > > > > > > > > Andrew > > > > > > > > > > > > -- > > > > Andrew Robinson > > > > Department of Mathematics and Statistics Tel: +61-3-8344-6410 > > > > University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 > > > > http://www.ms.unimelb.edu.au/~andrewpr > > > > http://blogs.mbs.edu/fishing-in-the-bay/ > > > >
> > >
> > -- > Andrew Robinson > Department of Mathematics and Statistics Tel: +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 > http://www.ms.unimelb.edu.au/~andrewpr > http://blogs.mbs.edu/fishing-in-the-bay/ > > _______________________________________________ > R-sig-mixed-models at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models >