Skip to content

standard error and statistical significance in lmer versus lm

2 messages · Vincenzo Lagani, Ben Bolker

#
Dear all,

I am modelling gene expression data with mixed models using lme4. My 
goal is to assess whether gene expression globally decreases or 
increases with time.

Specifically, the data consist in whole expression profiles measured at 
five different time points in two different strains of mice. For each 
time point and each strain there are three replicates. The data are not 
longitudinal, i.e., different mice are used at each time point. We had 
to remove one profile from a time point because it was not matching our 
quality criteria, so the data became not properly balanced.

On these data I am fitting the following model:

lme4Model.full <- lmer(values ~ week * genotype + (week | probesets), 
data = dataset, REML = FALSE)

where 'values' stands for the gene expression, 'week' is a scaled 
numeric reporting the age of the mice in weeks, and 'genotype' is a 
factor with two levels representing the two different genetic 
backgrounds. Each single gene is let having its own random intercept and 
slope.

What puzzles me is that when I compare this model with its simpler, 
non-mixed version:

lmModel.full <- lm(values ~ week * genotype, data = dataset)

I obtain the same coefficients but different standard errors (see 
below). Furthermore, while the interaction coefficient is not 
significant in the simple linear model, it becomes highly significant in 
the mixed model, at least according to these ANOVA tests:

lmModel <- lm(values ~ week + genotype, data = dataset)
lme4Model <- lmer(values ~ week + genotype + (week | probesets), data = 
dataset, REML = FALSE)
Analysis of Variance Table

Model 1: values ~ week * genotype
Model 2: values ~ week + genotype
   Res.Df     RSS Df Sum of Sq      F Pr(>F)
1 414341 2162664
2 414342 2162666 -1   -2.4697 0.4732 0.4915
Data: dataset
Models:
lme4Model: values ~ week + genotype + (week | probesets)
lme4Model.full: values ~ week * genotype + (week | probesets)
                Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
lme4Model       7 72376 72452 -36181    72362
lme4Model.full  8 72325 72413 -36155    72309 52.472      1  4.364e-13 ***



Assessing whether the interaction coefficient is significant is actually 
the aim of my study, and having two totally different answers confuses 
me. My understanding is that the mixed model better catches the variance 
structure of the data and thus it is able to better estimate the 
standard errors and p-values of the coefficients. Is this correct? In 
other words, can I confidently claim that the p-values obtained from the 
mixed models are "the correct ones" and that the interaction term is 
actually significant?

My apologies if this issue has been already posted on this list. Despite 
having seen multiple posts here on similar topics, I have not been able 
to find an answer to these questions.

Thanks in advance for your help. Any suggestion is very welcome.

Regards,

Vincenzo
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: values ~ week * genotype + (week | probesets)
    Data: dataset

      AIC      BIC   logLik deviance df.resid
  72325.3  72412.8 -36154.7  72309.3   414337

Scaled residuals:
      Min       1Q   Median       3Q      Max
-13.2622  -0.5246   0.0128   0.5270  23.3456

Random effects:
  Groups    Name        Variance Std.Dev. Corr
  probesets (Intercept) 5.167338 2.27318
            week        0.005029 0.07092  -0.23
  Residual              0.047063 0.21694
Number of obs: 414345, groups:  probesets, 18015

Fixed effects:
                    Estimate Std. Error t value
(Intercept)       6.5727983  0.0169414   388.0
week             -0.0151336  0.0006823   -22.2
genotypeCSB       0.2405300  0.0007121   337.8
week:genotypeCSB  0.0050430  0.0006962     7.2

Correlation of Fixed Effects:
             (Intr) week   gntCSB
week        -0.177
genotypeCSB -0.015 -0.028
wk:gntypCSB -0.001 -0.392 -0.054
Call:
lm(formula = values ~ week * genotype, data = dataset)

Residuals:
     Min      1Q  Median      3Q     Max
-5.7560 -1.9881  0.0083  1.6823  7.6407

Coefficients:
                   Estimate Std. Error  t value Pr(>|t|)
(Intercept)       6.572798   0.004407 1491.384  < 2e-16 ***
week             -0.015134   0.004547   -3.329 0.000873 ***
genotypeCSB       0.240530   0.007500   32.072  < 2e-16 ***
week:genotypeCSB  0.005043   0.007331    0.688 0.491537
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 2.285 on 414341 degrees of freedom
Multiple R-squared:  0.002491,	Adjusted R-squared:  0.002484
F-statistic: 344.9 on 3 and 414341 DF,  p-value: < 2.2e-16
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 14-12-26 02:11 PM, Vincenzo Lagani wrote:
As far as I can tell from what you've posted, the result given by
lme4 is indeed (or certainly could be/I have no reason to believe it
is not) correct. I'm not sure of the best way to explain the result to
you, though.  Adding the variation among probesets to the model does
indeed explain a lot of variation that would otherwise end up being
modeled as error and filtering into the standard errors.  It would be
nice to understand the differences by visualizing the different
models, but with such a large data set it could be challenging ...
One thing that might be interesting would be plotting the residuals from
a model of only probeset variation (i.e., values ~ (week|probeset))
and seeing how the week*genotype pattern was (hopefully) clarified.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUniiHAAoJEOCV5YRblxUH5GEH/R57rvnCi7OYj0HEV+1dQ9sv
awXYPd9/q7t2ZPVyrJ8OdVL2+ntVe7KYKFy28D2uRa5eyyH6/jaoy9nSlGI4Gvd0
dslGQYlIpSw9LmOHY1BPcQYZuqEoJoHlbonbX+00AwgANdanP0CpSWNzNVRmbcUL
ftPAErAaJecb7yu56+I2Yz5ugN3NYrqNdWvTV/HYxt5emjx45gQdQd4cQRTfKw3n
JkcM8DMRrOjvN6w2H8Pgps/yE3W+nx5VsgBaSdmagwTIfke6aZ90+55jMhjbDXNJ
xsbFzPc3CUA5SUO7qQHwoteMz8QlfRp3D8SgfmdfaPixskWCdMHjxAws+0ePAhw=
=0jkf
-----END PGP SIGNATURE-----