Skip to content

strange model fit- help

3 messages · marKo, Thierry Onkelinx

#
Hi.

I have fitted a relatively complicated model to electrodermal data (a 
simple resting and stimulus situation). The data summary follows.

 > summary(data)
        id             sc              t              stim
  g1_1   :  49   Min.   :26798   Min.   :  1.0   before  :3201
  g1_12  :  49   1st Qu.:32299   1st Qu.:123.0   after   :1543
  g1_13  :  49   Median :32486   Median :245.0
  g1_14  :  49   Mean   :32253   Mean   :244.9
  g1_15  :  49   3rd Qu.:32587   3rd Qu.:367.0
  g1_2   :  49   Max.   :32761   Max.   :489.0
  (Other):4450

id (person), and stim are factors, t is time (in s) and sc is skin 
conductance level. Sc distribution is quite negatively asymmetrical at 
the dataset level, although not that bad at the id level. As the 
stimulus occur at a specified time, those two variables are correlated 
(0.81).

The model follows.

m1<-lmer(sc~1+t+I(t^2)+stim+stim:t+stim:I(t^2)+(1+t+I(t^2)+stim+stim:t+stim:I(t^2)|id), 
data=data)

Here goes the summary.

 > summary(m1)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: sc ~ 1 + t + I(t^2) + stim + stim:t + stim:I(t^2) + (1 + t + 
I(t^2) + stim + stim:t + stim:I(t^2) | id)
    Data: data

      AIC      BIC   logLik deviance df.resid
  62325.9  62506.9 -31134.9  62269.9     4716

Scaled residuals:
      Min       1Q   Median       3Q      Max
-24.3783  -0.1551  -0.0074   0.1392  12.5288

Random effects:
  Groups   Name          Variance  Std.Dev.  Corr
  id       (Intercept)   4.681e+04 2.164e+02
           t             7.925e+00 2.815e+00  1.00
           I(t^2)        2.559e-05 5.059e-03 -0.87 -0.87
           stim.L        1.591e+05 3.989e+02 -0.12 -0.12  0.17
           t:stim.L      1.105e+00 1.051e+00 -0.58 -0.58  0.78  0.33
           I(t^2):stim.L 2.367e-05 4.865e-03  0.06  0.06 -0.21 -0.76 -0.45
  Residual               2.049e+04 1.432e+02
Number of obs: 4744, groups:  id, 97

Fixed effects:
                 Estimate Std. Error t value
(Intercept)    2.960e+04  1.637e+02  180.85
t              1.291e+01  8.493e-01   15.20
I(t^2)        -1.579e-02  1.110e-03  -14.21
stim.L        -3.956e+03  2.329e+02  -16.98
t:stim.L       2.047e+01  1.136e+00   18.01
I(t^2):stim.L -2.477e-02  1.478e-03  -16.76

Correlation of Fixed Effects:
             (Intr) t      I(t^2) stim.L t:st.L
t           -0.886
I(t^2)       0.811 -0.966
stim.L       0.972 -0.930  0.868
t:stim.L    -0.989  0.911 -0.826 -0.973
I(t^2):st.L  0.917 -0.858  0.761  0.870 -0.947

The fit of the model is quite good (pseudo r2 is 0.96), but have some 
problems:
1: quite ?extreme? residuals (-24.3783,  12.5288)
2: quite high correlations among random effects
3: lousy qqplot (apart from the perfect fit on the  from -2 to +2 std 
normal quantiles)

Help please? What is wrong with the model (something is, I?m sure).
#
Dear Marko,

Keep in mind that squating time in seconds lead to large numbers (489 ^ 2 =
239121). This forces the parameters estimates to be small. You can solve
this either by using orthogonal polynomial (poly(t, 2)) or by rescaling t
(e.g. in minutes rather than seconds: 489 s = 8.15 min, 8.15 ^ 2 = 66.4225)
If you go for rescaling, then create 2 variables: t_min and t_min2 (= t_min
^ 2). That will make your formula more reable.

It looks like you coded stim as an ordered factor. That not required since
you have only two levels. Use a default factor with before as reference.

The problem with the strong correlations between t and t^2 random effects
is that they are highly correlated themselves. cor(0:489, (0:489) ^ 2) =
0.986 Note that is isn't solved by rescaling. Only centering works .eg
centering at 4 minutes yields cor(0:489 - 4 * 60 / 60, (0:489 - 4 * 60) ^
2) = 0.071 Orthogonal polynomials have the benefit that they are
uncorrelated by definition. cor(poly(0:489, 2))

Bottomline: always scale and center polynomials. I prefer to scale and
center to revelant values, e.g. scale to a different unit and center to an
important point near the middle of the domain. That keeps your parameters
interpretable without the need to recalculate them (as you would when
scaling by the standard deviation and center to the mean).

Best regards,

Thierry


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op za 14 mrt. 2020 om 17:10 schreef marKo via R-sig-mixed-models <
r-sig-mixed-models at r-project.org>:

  
  
11 days later
#
Thank a lot for the hints.
Rescaling into minutes helped a little. I would like to retain both the 
raw polynomial terms, both the non-centered time (for ease of 
readability). The model was written explicitly (not poly(time, 2, 
raw=TRUE) for the same reasons.

The random part is quite complex but i will retain it as it is, because 
of the almost perfect fit.

Thanks,

Marko
On 15. 03. 2020. 13:14, Thierry Onkelinx wrote: