Skip to content

High correlation among random effects for longitudinal model

3 messages · John Fox, Joshua Rosenberg

#
Dear Joshua,

I'm chiming in late, so it's possible that someone already pointed this out and I didn't notice. A better way to specify a polynomial in R is to use poly() in the model formula. By default, this produces orthogonal polynomial regressors (at least in the fixed effects) but the same fit to the data. For example,
1          2
[1,] -0.6324555  0.5345225
[2,] -0.3162278 -0.2672612
[3,]  0.0000000 -0.5345225
[4,]  0.3162278 -0.2672612
[5,]  0.6324555  0.5345225
attr(,"coefs")
attr(,"coefs")$alpha
[1] 3 3

attr(,"coefs")$norm2
[1]  1  5 10 14

attr(,"degree")
[1] 1 2
attr(,"class")
[1] "poly"   "matrix"
1            2 
0.000000e+00 1.110223e-16
1             2
1  1.000000e+00 -1.110223e-16
2 -1.110223e-16  1.000000e+00

My guess is that this will also reduce the correlations among the random effects. If you really must have raw polynomials, then poly(time, 2, raw=TRUE) offers the advantage that model-structure-aware functions can understand that the linear and quadratic regressors are part of the same term in the model.

Whether high correlations among the random effects are really a problem, my guess is that they aren't, because lme() uses a log-Cholesky factorization of the random-effects covariance matrix anyway. In some contexts, high correlations might produce numerical instability, but, as I said, probably not here. Ben would know.

I hope this helps,
 John

-----------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socialsciences.mcmaster.ca/jfox/
3 days later
#
Dear John,

Thank you so much--will use poly in the future (even in cases in which I
might *not* use orthogonal polynomials), as it appears that the summary
function returns helpful output (i.e., correlations fo the fixed effects),
in addition to its other benefits.

When I use the poly function, the random effects correlations are lower:

Random effects:
 Formula: ~+poly(wave, 2) | student_ID
 Structure: General positive-definite, Log-Cholesky parametrization
               StdDev     Corr
(Intercept)      2.178734 (Intr) p(,2)1
poly(wave, 2)1 279.521839 0.834
poly(wave, 2)2 281.979199 0.751  0.987
Residual        16.292629

As are the fixed effects:

Fixed effects: stwm ~ +poly(wave, 2)
                   Value Std.Error   DF   t-value p-value
(Intercept)      5.30155  0.231096 7070 22.940957       0
poly(wave, 2)1 196.33516 23.481935 7070  8.361115       0
poly(wave, 2)2 113.69005 23.591988 7070  4.819011       0
 Correlation:
               (Intr) p(,2)1
poly(wave, 2)1 0.344
poly(wave, 2)2 0.311  0.516

However, when I calculate individual (i.e., group)-specific predicted
values (i.e., BLUPs, using the predict() method, with level = 1), they are
(very) highly correlated:

# A tibble: 3 x 4
  rowname   intercept linear quadratic
  <chr>         <dbl>  <dbl>     <dbl>
1 intercept    NA      0.960     0.960
2 linear        0.960 NA         1.00
3 quadratic     0.960  1.00     NA

When I calculate the same individual-specific predicted values using the
non-orthogonal (raw) polynomials, these correlations are very nearly as
high. At this point, I'm curious how / why these predictions are so highly
correlated.

Thank you again for your help, John, and for yours, Ben, Stuart, Jorg, and
Thierry. Sorry if this is a beginner or otherwise ignorant question as I
learn to work with these longitudinal / polynomial models.
josh
On Tue, Apr 3, 2018 at 1:32 PM, Fox, John <jfox at mcmaster.ca> wrote:

            

  
    
#
John et al., please ignore my question, I was a) confusing two things (I
wasn't using the predict() method, but rather ranef()) and b) the
correlations are high but not as high as I thought (closer to .95 than 1) -
due to an error in how I was calculating the individual-specific
predictions using ranef().

Thanks again.
Josh
On Fri, Apr 6, 2018 at 4:53 PM Joshua Rosenberg <jrosen at msu.edu> wrote: