correlation of fixed effects coefficients all close to +/-1

Mon, May 25, 2020 10:12 AM

UPDATE

Dear Phillip and list

As you can see from the graph attached, one of the categories of the
predictor variable ("madera") only has one observation.
I decided to remove this observation and I ran the model again, this is the
corr matrix I get:

Correlation of Fixed Effects:
            (Intr) Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
Tp_rsdOrgnc -0.725
Tip_rsdOtrs -0.747  0.593
Tp_rsdPplyc -0.575  0.458    0.470
Tp_rsdPlstc -0.659  0.526    0.542    0.419
Tipo_resdRd -0.445  0.356    0.367    0.282  0.328
Tipo_rsdVdr -0.747  0.593    0.612    0.470  0.542  0.367

I am aware that modifying a dataset is unacceptable, but I think it showed
that the source of the problem was lack of observations, am I correct?
Is there a better way to deal with this? I would rather not delete a line
of my dataset, even though it is a very uncommon observation for which I do
not aim to get predictions.

Thank you again for your advice


On Mon, May 25, 2020 at 10:52 AM Alessandra Bielli <

bielli.alessandra at gmail.com> wrote:

Hi Phillip

Thank you so much for your explanation.

I have a couple more questions

1.In my model, the regression coefficients of each one of the categories
of my predictor are correlated, but I just have one categorical predictor.
In case of collinearity I would usually drop one predictor, but here I only
have one and my goal is to use the model to predict the dependent variable.
What's the procedure here?

2. Is there a test or visual way to determine if I have enough data to get
good estimates?

3. A couple days ago I came across this post on Cross validated that
states that the correlation of fixed effect part of the outpout is only
useful in special cases,
https://stats.stackexchange.com/questions/57240/how-do-i-interpret-the-correlations-of-fixed-effects-in-my-glmer-output.
The post references the book
http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenCUPstats.pdf,
page 268,

"The summary concludes with a table listing the correlations of the fixed
effects. The numbers listed here can be used to construct confidence
elipses for pairs of fixed-effects parameters, and should not be confused
with the normal correlation obtained by applying cor() to pairs of
predictor vectors in the input data. Since constructing confidence ellipses
is beyond the scope of the book we will often suppress this table".

What I understand is that the correlation matrix is useful for prediction
of future values, which is also my case, but I am not entirely sure I am
interpreting this correctly.

I really appreciate your advice!

Alessandra

On Sun, May 24, 2020 at 3:15 PM Phillip Alday <phillip.alday at mpi.nl>
wrote:

Hi,

Very high correlations of the fixed-effects estimates can indicate two
problems (which are actually just different manifestations of the same
deeper problem):

1. Multicollinearity -- this is the same as multicollinearity in
classical/standard/non mixed-effects regression. Basically this means
that some of your variables are expressing the same thing and so you
have some redundancies that could be eliminated. Perfect
multicollinearity leads to a rank-deficient model matrix, which R will
catch and correct, but near multicollinearity may not be caught.

2. You don't have enough data to get good estimates of all your
coefficients.

The bigger problem for your inference is that both of these problems
will inflate your standard errors. In both cases, there isn't enough
information to full tease apart the contribution from the different
variables, which means that you have a lot of variability in your
estimates and thus large standard errors.

Note that some correlation between estimates is expected. If you think
of a very simple case with the intercept and one slope/predictor then
you'll see that if you change the intercept, then you have to change the
slope a bit to get the line to stay close to the observed data.

(Once again, I worry that I've oversimplified and said something
horribly infelicitous, but I'm always happy to be corrected and learn
something myself!)

Best,

Phillip

On 11/5/20 11:42 pm, Alessandra Bielli wrote:

Dear list,

I am fitting the mixed effect model:

 > lmer(log(percapita_day) ~ Type_residuo + (1|boatID), data=all)

 where percapita_day is a non-negative continuous response variable (on

the

log scale to have residuals normally distributed), Type_residuo is a
categorical explanatory variable and boatID is a random effect with 4
levels.

I have found values very close to +/-1 in the correlation of fixed

effects

matrix below, and after some research I learnt that the coefficients are
not about the correlation of the variables but the expected correlation

of

the regression coefficients.

Correlation of Fixed Effects:
            (Intr) Tp_rsM Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
Type_rsdMtl -0.944
Tp_rsdOrgnc -0.951  0.945
Typ_rsdOtrs -0.959  0.953  0.959
Tp_rsdPplyc -0.926  0.919  0.925    0.933
Tp_rsdPlstc -0.951  0.945  0.951    0.958    0.925
Type_resdRd -0.870  0.867  0.873    0.878    0.850  0.872
Type_rsdVdr -0.954  0.949  0.955    0.962    0.928  0.954  0.876

However I still can't explain why all coefficients are so close to +/-1

and

I was wondering if these are indicators that something is wrong with my
model?
Is that due to the presence of outlayers in the response variable (see
attached)?

Thanks,

Alessandra

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot01.pdf
Type: application/pdf
Size: 11255 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20200525/5e845415/attachment-0001.pdf>

correlation of fixed effects coefficients all close to +/-1

Thread (6 messages)