Meaning of perfect correlation between by-intercept and by-slope adjustments - R-SIG-mixed-models

Fri, May 13, 2011 10:35 AM #

Hello!
Simplified model that I have is:
lmer(Y ~ F1 + F2 + C1 + (1+F1|participants) + (1|items))
F1 and F2 are categorical predictors (factors) and C1 is a covariable 
(continuous predictor). F1 has five levels.
By-participant adjustments for F1 are justified (likelihood ratio test 
is highly significant). However, what puzzles me is perfect correlation 
between two levels of F1. Others are quite high, but not perfect. I 
wonder what this means, exactly? Is there some "lack of information" 
which leads to problems in estimating variances?

Thanks!
Petar

Douglas Bates

Fri, May 13, 2011 1:00 PM #

On Fri, May 13, 2011 at 12:35 PM, Petar Milin <pmilin at ff.uns.ac.rs> wrote:

I think of the estimation criterion for mixed models (the REML
criterion or the deviance) as being like a smoothing criterion that
seeks to balance complexity of the model versus fidelity to the data.
It happens that models in which the variance covariance matrix of the
random effects is singular or nearly singular are considered to have
low complexity so the criterion will push the optimization to that
extreme when doing so does not introduce substantially worse fits.

One way around this is to avoid fitting models with vector-valued
random effects and, instead, use two terms with simple scalar random
effects, as in

lmer(Y ~ F1 + F2 + C1 + (1|participants) + (1|F1:participants) + (1|items))

Petar Milin

Fri, May 13, 2011 1:32 PM #

On 13/05/11 22:00, Douglas Bates wrote:

I am always hesitant to go for scalar version. As far as I understand, 
this implies homoscedasticity across levels of F1, but correct me if I 
am wrong. In my model, I am not sure if that would be correct.

Best,
Petar

Douglas Bates

Fri, May 13, 2011 1:51 PM #

On Fri, May 13, 2011 at 3:32 PM, Petar Milin <pmilin at ff.uns.ac.rs> wrote:

You are correct.  However, the model with vector-valued random effects
is not supported by the data in the sense that it converges to a
singular variance-covariance matrix.  When you have 5 random effects
associated with each level of participant and you allow the 5 by 5
positive semi-definite variance-covariance matrix you are attempting
to estimate 15 variance parameters for that one term.  You need a lot
of data to be able to do that.

Ben Bolker

Sat, May 14, 2011 5:48 AM #

On 11-05-16 02:09 PM, Petar Milin wrote:

On Fri, May 13, 2011 at 10:51 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

On Fri, May 13, 2011 at 3:32 PM, Petar Milin <pmilin at ff.uns.ac.rs> wrote:

On 13/05/11 22:00, Douglas Bates wrote:

On Fri, May 13, 2011 at 12:35 PM, Petar Milin<pmilin at ff.uns.ac.rs>  wrote:

Hello! Simplified model that I have is:
lmer(Y ~ F1 + F2 + C1 + (1+F1|participants) + (1|items))
F1 and F2 are categorical predictors (factors) and C1 is a covariable
(continuous predictor). F1 has five levels.
By-participant adjustments for F1 are justified (likelihood ratio test is
highly significant). However, what puzzles me is perfect correlation
between
two levels of F1. Others are quite high, but not perfect. I wonder what
this
means, exactly? Is there some "lack of information" which leads to
problems
in estimating variances?

I think of the estimation criterion for mixed models (the REML
criterion or the deviance) as being like a smoothing criterion that
seeks to balance complexity of the model versus fidelity to the data.
It happens that models in which the variance covariance matrix of the
random effects is singular or nearly singular are considered to have
low complexity so the criterion will push the optimization to that
extreme when doing so does not introduce substantially worse fits.

One way around this is to avoid fitting models with vector-valued
random effects and, instead, use two terms with simple scalar random
effects, as in

lmer(Y ~ F1 + F2 + C1 + (1|participants) + (1|F1:participants) +
(1|items))

I am always hesitant to go for scalar version. As far as I understand, this
implies homoscedasticity across levels of F1, but correct me if I am wrong.
In my model, I am not sure if that would be correct.

You are correct.  However, the model with vector-valued random effects
is not supported by the data in the sense that it converges to a
singular variance-covariance matrix.  When you have 5 random effects
associated with each level of participant and you allow the 5 by 5
positive semi-definite variance-covariance matrix you are attempting
to estimate 15 variance parameters for that one term.  You need a lot
of data to be able to do that.

I am reading various stuff, trying to understand and cope with this
properly. Bottom line, using vector-valued random effects, in the
above case -- with a perfec correlation between random adjustments,
would be a case of overfitting?

I think so.
  If you wanted a justification for dropping back to the homoscedastic
model, you could compare the likelihoods of the heteroscedastic and
homoscedastic model fits, which you can probably establish are a pair of
nested models (and whose likelihoods may actually be identical).

Petar Milin

Mon, May 16, 2011 11:09 AM #

On Fri, May 13, 2011 at 10:51 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

On Fri, May 13, 2011 at 3:32 PM, Petar Milin <pmilin at ff.uns.ac.rs> wrote:

On 13/05/11 22:00, Douglas Bates wrote:

On Fri, May 13, 2011 at 12:35 PM, Petar Milin<pmilin at ff.uns.ac.rs> ?wrote:

Hello! Simplified model that I have is:
lmer(Y ~ F1 + F2 + C1 + (1+F1|participants) + (1|items))
F1 and F2 are categorical predictors (factors) and C1 is a covariable
(continuous predictor). F1 has five levels.
By-participant adjustments for F1 are justified (likelihood ratio test is
highly significant). However, what puzzles me is perfect correlation
between
two levels of F1. Others are quite high, but not perfect. I wonder what
this
means, exactly? Is there some "lack of information" which leads to
problems
in estimating variances?

I think of the estimation criterion for mixed models (the REML
criterion or the deviance) as being like a smoothing criterion that
seeks to balance complexity of the model versus fidelity to the data.
It happens that models in which the variance covariance matrix of the
random effects is singular or nearly singular are considered to have
low complexity so the criterion will push the optimization to that
extreme when doing so does not introduce substantially worse fits.

One way around this is to avoid fitting models with vector-valued
random effects and, instead, use two terms with simple scalar random
effects, as in

lmer(Y ~ F1 + F2 + C1 + (1|participants) + (1|F1:participants) +
(1|items))

I am always hesitant to go for scalar version. As far as I understand, this
implies homoscedasticity across levels of F1, but correct me if I am wrong.
In my model, I am not sure if that would be correct.

You are correct. ?However, the model with vector-valued random effects
is not supported by the data in the sense that it converges to a
singular variance-covariance matrix. ?When you have 5 random effects
associated with each level of participant and you allow the 5 by 5
positive semi-definite variance-covariance matrix you are attempting
to estimate 15 variance parameters for that one term. ?You need a lot
of data to be able to do that.

I am reading various stuff, trying to understand and cope with this
properly. Bottom line, using vector-valued random effects, in the
above case -- with a perfec correlation between random adjustments,
would be a case of overfitting?

Thanks!
Petar

Douglas Bates

Mon, May 16, 2011 11:39 AM #

On Mon, May 16, 2011 at 1:09 PM, Petar Milin <pmilin at ff.uns.ac.rs> wrote:

On Fri, May 13, 2011 at 10:51 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

On Fri, May 13, 2011 at 3:32 PM, Petar Milin <pmilin at ff.uns.ac.rs> wrote:

On 13/05/11 22:00, Douglas Bates wrote:

On Fri, May 13, 2011 at 12:35 PM, Petar Milin<pmilin at ff.uns.ac.rs> ?wrote:

Hello! Simplified model that I have is:
lmer(Y ~ F1 + F2 + C1 + (1+F1|participants) + (1|items))
F1 and F2 are categorical predictors (factors) and C1 is a covariable
(continuous predictor). F1 has five levels.
By-participant adjustments for F1 are justified (likelihood ratio test is
highly significant). However, what puzzles me is perfect correlation
between
two levels of F1. Others are quite high, but not perfect. I wonder what
this
means, exactly? Is there some "lack of information" which leads to
problems
in estimating variances?

I think of the estimation criterion for mixed models (the REML
criterion or the deviance) as being like a smoothing criterion that
seeks to balance complexity of the model versus fidelity to the data.
It happens that models in which the variance covariance matrix of the
random effects is singular or nearly singular are considered to have
low complexity so the criterion will push the optimization to that
extreme when doing so does not introduce substantially worse fits.

One way around this is to avoid fitting models with vector-valued
random effects and, instead, use two terms with simple scalar random
effects, as in

lmer(Y ~ F1 + F2 + C1 + (1|participants) + (1|F1:participants) +
(1|items))

I am always hesitant to go for scalar version. As far as I understand, this
implies homoscedasticity across levels of F1, but correct me if I am wrong.
In my model, I am not sure if that would be correct.

You are correct. ?However, the model with vector-valued random effects
is not supported by the data in the sense that it converges to a
singular variance-covariance matrix. ?When you have 5 random effects
associated with each level of participant and you allow the 5 by 5
positive semi-definite variance-covariance matrix you are attempting
to estimate 15 variance parameters for that one term. ?You need a lot
of data to be able to do that.

I am reading various stuff, trying to understand and cope with this
properly. Bottom line, using vector-valued random effects, in the
above case -- with a perfec correlation between random adjustments,
would be a case of overfitting?

Yes - at least I would interpret the results that way.