Meaning of perfect correlation between, by-intercept and by-slope adjustments - R-SIG-mixed-models

Tue, May 17, 2011 2:56 AM #

Message: 5
Date: Sat, 14 May 2011 08:48:51 -0400
From: Ben Bolker<bbolker at gmail.com>
To:r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Meaning of perfect correlation between
	by-intercept and by-slope adjustments
Message-ID:<4DCE7A33.8090601 at gmail.com>
Content-Type: text/plain; charset=UTF-8

On 11-05-16 02:09 PM, Petar Milin wrote:

 On Fri, May 13, 2011 at 10:51 PM, Douglas Bates<bates at stat.wisc.edu>  wrote:

 On Fri, May 13, 2011 at 3:32 PM, Petar Milin<pmilin at ff.uns.ac.rs>  wrote:

 On 13/05/11 22:00, Douglas Bates wrote:

 On Fri, May 13, 2011 at 12:35 PM, Petar Milin<pmilin at ff.uns.ac.rs>   wrote:

 Hello! Simplified model that I have is:
 lmer(Y ~ F1 + F2 + C1 + (1+F1|participants) + (1|items))
 F1 and F2 are categorical predictors (factors) and C1 is a covariable
 (continuous predictor). F1 has five levels.
 By-participant adjustments for F1 are justified (likelihood ratio test is
 highly significant). However, what puzzles me is perfect correlation
 between
 two levels of F1. Others are quite high, but not perfect. I wonder what
 this
 means, exactly? Is there some "lack of information" which leads to
 problems
 in estimating variances?

 I think of the estimation criterion for mixed models (the REML
 criterion or the deviance) as being like a smoothing criterion that
 seeks to balance complexity of the model versus fidelity to the data.
 It happens that models in which the variance covariance matrix of the
 random effects is singular or nearly singular are considered to have
 low complexity so the criterion will push the optimization to that
 extreme when doing so does not introduce substantially worse fits.

 One way around this is to avoid fitting models with vector-valued
 random effects and, instead, use two terms with simple scalar random
 effects, as in

 lmer(Y ~ F1 + F2 + C1 + (1|participants) + (1|F1:participants) +
 (1|items))

 I am always hesitant to go for scalar version. As far as I understand, this
 implies homoscedasticity across levels of F1, but correct me if I am wrong.
 In my model, I am not sure if that would be correct.

 You are correct.  However, the model with vector-valued random effects
 is not supported by the data in the sense that it converges to a
 singular variance-covariance matrix.  When you have 5 random effects
 associated with each level of participant and you allow the 5 by 5
 positive semi-definite variance-covariance matrix you are attempting
 to estimate 15 variance parameters for that one term.  You need a lot
 of data to be able to do that.

 
 I am reading various stuff, trying to understand and cope with this
 properly. Bottom line, using vector-valued random effects, in the
 above case -- with a perfec correlation between random adjustments,
 would be a case of overfitting?

   I think so.
   If you wanted a justification for dropping back to the homoscedastic
model, you could compare the likelihoods of the heteroscedastic and
homoscedastic model fits, which you can probably establish are a pair of
nested models (and whose likelihoods may actually be identical).

I forgot to mention, but I did likelihood ratio test, immediately after 
Doug's suggestion.
However, conceptually, I do not like to compare a model that is 
suspected for overfitting with some/any other model. I wonder if that is 
correct at all: AIC, BIC and logLik are measures of goodness-of-fit, and 
a particular fit is "wrong", so to say.
Furthermore, I am getting better fit for the model that uses 
vector-valued random effect, which, now we know, overfits.
Honestly, I wonder whether I should go for likelihood ratio, if 
variance/covariance matrix of random effects is singular or nearly singular?


Many thanks for the great discussion!
Best,
Petar