Distributional assumptions + case studies (was: Random or Fixed effects appropriate?)

On 4/9/08, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
 > Hi Reinhold,

 >  On Wed, Apr 09, 2008 at 05:45:54PM +0200, Reinhold Kliegl wrote:
 >  > I think this is a reasonable summary.

 >  > You were not clear on how you plan to use the conditional modes (i.e.,
 >  > your point 1).  Please keep in mind that conditional modes are not
 >  > independent "observations" like a group mean or within-group effect or
 >  > slope, simply because shrinkage correction uses all data. Also, for
 >  > example, their correlations (i.e., between intercept and x for units
 >  > of C) are typically not identical to the estimated model correlations
 >  > displayed in the random-effects part (see also the Bates quote in my
 >  > last comment).

 >  > In analyses of reaction times (using subjects and items as crossed
 >  > random factors; carried out with Mike Masson and Eike Richter, 2007),
 >  > model-based estimates of correlations among random effects revealed
 >  > "clearer" patterns than the correlations between means and effects
 >  > computed for each subject (as they should, given that they were
 >  > corrected for unreliability). Unlike for fixed-effects estimates,
 >  > however, estimates of correlations among random effects were quite
 >  > susceptible to violations of distributional assumptions for the
 >  > residuals--up to a change in the sign of the correlation!

 >  This is a very interesting observation, and one that I suspect should
 >  not be buried in an email.  Can you tell us more about it?  In my
 >  workshops, I spend a lot of time focusing on the use of diagnostics to
 >  check distributional assumptions.  It would be fabulous to be able to
 >  identify a case study in which getting the distributional assumptions
 >  was so clearly important.

 >  More generally, I wonder if it might be worth collecting such a set of
 >  case studies with clear and thorough analyses and wrapping them in a
 >  document.  It seems to me that it would answer the request made by
 >  Iasonas Lamprianou recently.

 >  I'd be happy to coordinate such an effort, so long as the
 >  contributions were in LaTeX and Sweave.  I know my students would
 >  benefit from it :)

 >  Is there any interest in such an idea, from potential conributors or
 >  (equally importantly) potential users?

I certainly would be delighted to have such a collection made
 available and would be happy to have it hosted on
 http://lme4.r-forge.r-project.org/ if that seemed suitable.

 I would also recommend some of the examples in chapter 7 of Haarald
(Sorry Harald - I got carried away doubling the a's in your name.)
 Baayen's new book "Analyzing Linguistic Data: A Practical Introduction
 to Statistics using R"

 # Paperback: 368 pages
 # Publisher: Cambridge University Press; 1 edition (March 17, 2008)
 # Language: English
 # ISBN-10: 0521709180
 # ISBN-13: 978-0521709187

 >  > As far as
 >  > the use of conditional modes is concerned, the absolute values of
 >  > correlations between conditional modes were always larger than the
 >  > corresponding model estimates.
 >  >      In simulations, the model estimates of correlations recovered the
 >  > "true" variances and correlations, even after random deletion of 50%
 >  > of the data, but the variance of the conditional modes always
 >  > underestimated the true variance and the difference between model
 >  > estimate and correlation based on conditional modes increased with the
 >  > absolute magnitude of the correlation. In other words, conditional
 >  > modes underestimated the variance and exaggerated covariances and
 >  > correlations of random effects in these simulations. The shrinkage in
 >  > variance reflects the contribution of the likelihood in the
 >  > computation of the conditional modes.  In summary, according to these
 >  > simulations, the model estimates of correlations among random effects
 >  > are fine; the computed correlations based on conditional modes may
 >  > serve a useful heuristic function for further analyses but must be
 >  > handled with care.
 >  >
 >  > Best
 >  > Reinhold
 >  >
 >  > On Wed, Apr 9, 2008 at 11:21 AM, Nick Isaac <njbisaac at googlemail.com> wrote:
 >  > > Dear all,
 >  > >
 >  > >  Thanks for the comments and apologies for not providing more
 >  > >  information. I (mis)judged it would be better to discuss the issue
 >  > >  abstractly. There should be enough levels to estimate the variance of
 >  > >  C and at least one other random effect:
 >  > >
 >  > >  Number of obs: 1242, groups: D, 269; C, 64; B, 8; A, 3
 >  > >
 >  > >  My interpretation of comments by all three respondents is as follows:
 >  > >  1) extracting the random effects/BLUPs/conditional modes is reasonable
 >  > >  in general
 >  > >  2) a taxonomy might be considered fixed or random, depending on the
 >  > >  question and the number of units/levels
 >  > >  3) In my case, it would be better to use the conditional modes for x|C
 >  > >  than to fit x*C as an interaction term.
 >  > >
 >  > >  Best wishes, Nick
 >  > >
 >  > >
 >  > >
 >  > >
 >  > >  On 08/04/2008, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
 >  > >  > On Tue, Apr 08, 2008 at 07:10:16PM +0200, Reinhold Kliegl wrote:
 >  > >  >  > >  My dataset has one continuous normally-distributed fixed effect and
 >  > >  >  > >  four random effects that are nested (in fact, it is a taxonomy). For
 >  > >  >  > >  simplicity, I've removed the variable names, so the dataset has the
 >  > >  >  > >  following structure:
 >  > >  >  > >
 >  > >  >  > >  y ~ x | A/B/C/D
 >  > >  >  > It would be good to know how many units/levels you have for each of
 >  > >  >  > your four random effects. Those with fewer than, say, five, are good
 >  > >  >  > candidates for being specified as fixed effects. Think how many
 >  > >  >  > observations you need to get a stable estimate of a variance!
 >  > >  >  >
 >  > >  >  > >  lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + C + x:C) #error:
 >  > >  >  > >  Downdated X'X is not positive definite, 82
 >  > >  >  > You cannot include C both as a random and a fixed effect
 >  > >  >
 >  > >  >
 >  > >  >
 >  > >  > I do not believe that this is generally true.  See, for example,
 >  > >  >
 >  > >  >  > require(lme4)
 >  > >  >  > (fm1 <- lmer(Reaction ~ Days + Subject + (Days|Subject),  sleepstudy))
 >  > >  >
 >  > >  >  Therefore I am uncertain as to how you can draw this conclusion
 >  > >  >  without more information about the design (which the poster really
 >  > >  >  should have provided).
 >  > >  >
 >  > >  >
 >  > >  >
 >  > >  >  > >  lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + x:C) #gives sensible results
 >  > >  >  > If this gives sensible results, I suspect you have very few levels of
 >  > >  >  > C, say, 2 or 3?
 >  > >  >  > In this case, definitely specify C and x and their interaction as
 >  > >  >  > fixed effects, e.g.:
 >  > >  >  > lmer( y ~ x*C + (1|A) + (1|B)  + (1|D)
 >  > >  >  >
 >  > >  >  > The following may not apply to your case, but it might: Sometimes
 >  > >  >  > people think that a nested/taxonomic design implies a random effect
 >  > >  >  > structure (e.g., schools, classes, students). This is not true. If you
 >  > >  >  > have only a few units for each factor, you are better off to specify
 >  > >  >  > it as a fixed-effects rather than a random-effects taxonomy. (Of
 >  > >  >  > course, you lose generalizability, but if you want this you should
 >  > >  >  > make sure you have sample that provides a basis for it.)
 >  > >  >
 >  > >  >
 >  > >  > I can see the sense behind this position but sometimes a few units are
 >  > >  >  all that is available, and including them in a model as fixed effects
 >  > >  >  muddies the statistical waters, especially if they are the kinds of
 >  > >  >  effects that a model user will be unlikely to naturally condition upon.
 >  > >  >
 >  > >  >  I do agree that if there are problems with model fitting and/or
 >  > >  >  interpretation when the design is rigorously followed, then a more
 >  > >  >  flexible approach can and should be adopted, and appropriate
 >  > >  >  allowances must be made.
 >  > >  >
 >  > >  >
 >  > >  >  > The interpretation of conditional modes (formerly knowns as BLUPs,
 >  > >  >  > that is "predictions") is a tricky business, especially with few
 >  > >  >  > units per levels.
 >  > >  >
 >  > >  >
 >  > >  > Sorry, I think I've missed something.  In what sense are the
 >  > >  >  conditional modes formerly known as BLUPs?
 >  > >  >
 >  > >  >  Andrew
 >  > >  >
 >  > >  >
 >  > >  >  --
 >  > >  >  Andrew Robinson
 >  > >  >  Department of Mathematics and Statistics            Tel: +61-3-8344-6410
 >  > >  >  University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
 >  > >  >  http://www.ms.unimelb.edu.au/~andrewpr
 >  > >  >  http://blogs.mbs.edu/fishing-in-the-bay/
 >  > >  >
 >  > >
 >
 >  --
 >  Andrew Robinson
 >  Department of Mathematics and Statistics            Tel: +61-3-8344-6410
 >  University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
 >  http://www.ms.unimelb.edu.au/~andrewpr
 >  http://blogs.mbs.edu/fishing-in-the-bay/
 >
 >  _______________________________________________
 >  R-sig-mixed-models at r-project.org mailing list
 >  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
 >

Distributional assumptions + case studies (was: Random or Fixed effects appropriate?)

Thread (15 messages)