Mixed model (with interaction) for gene expression and iteration

Thu, Jun 4, 2009 2:38 AM

Sorry for replying to my own emails, but I found partial solution to my 
problems in Pinheiro and Bates at:

2.4.1 - Likelihood Ratio Tests (p.83)
to calculate logLik and p.values for random effects

and

2.4.2 - Hypothesis tests for Fixed-Effect Terms (p.87)
that is self-explanatory. Although I still don't understand how to 
calculate the "denominator degrees of freedom".

Best,
paolo

PS. Off-topic quick question: is there currently a "out-of-the-box" 
solution to get credible intervals for random effects in mcmcsamp()?
HPDinterval (from lme4 package) doesn't seem to work for that?

Paolo Innocenti wrote:

Dear Douglas, Rolf, Juan and list,

thank you very much for your replies.
I now got a good working model, and the use of refit and VarCorr will 
definitely help.

I had a go with mcmcsamp(), and I must confirm that this approach is not 
feasible, both computationally and because if you get a "false 
convergence" for, say, 1 gene out of 20, it becomes impossible to go 
back and fix all the errors.

So, the alternative approach seems more promising. If I understand 
correctly, you suggest to calculate a p-value for random effects out of 
the LRT (Likelihood ratio test), and use approximated DFs to calculate 
standard p-values for the fixed effects.

I neved used this approach, so I appreciate if you can point me in the 
right direction.

Random effects:
I'd need to compare the this three model:
m1a <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
m1b <- lmer(Y1 ~ sex + (1|line))
m1c <- lmer(Y1 ~ sex)

and get the effect of the interaction from m1a-m1b,
and the effect of "line" from m1b-m1c.
It that correct?
Can anyone point me to any kind of documentation/examples to sort out 
the details?

Fixed effects:
I don't really know where to get the approximated degrees of freedom. 
Can you point me to an example?

Thanks again for all the help. Eventually, I'll be really happy to share 
my experience/code when everything is sorted, even if I doubt I can add 
anything helpful.

Best,
paolo

PS. I don't really understand what you mean by
sobj<-summary(result)
what object is your "result" here?




Juan Pedro Steibel wrote:

Hello Paolo,
We are also starting to use lmer for gene expression analysis 
(genetical genomics too) so here are my thoughts.

Having two equivalent parameterizations, I would go with the 
computationally fastest one (a couple more miliseconds per model, 
easily add up to many hours when analyzing highly dimensional 
datasets). You can time the analysis for, say, 100 transcripts and go 
from there.

~note: other users commented on your models not being equivalent~

For p-values:

You could use LRT to test for variance components.

This is standard practice in genetic epidemiology: fit a model with 
and without the random effect in question, then compare the log 
(residual) likelihood ratio to a chi-square statistic. FDR can be used 
on top of that to account for multiple tests. Of course, now you have 
to fit three models (one null models for each VC), so your cpu  time 
has just multiplied by almost 3. Definitely using refit and update 
will help with compute time when having so many models.

I use sobj<-summary(result) as an intermediate step to get the info 
(although this may add to the computational burden and other 
suggestions you got may be faster and more efficient),

then ask for slots:
@REmat
@coefs
...for getting estimates of variance components and fixed effects.

@coefs gives you a t-statistic for fixed effects too... you could take 
a stab at an approximated df method and compute an (approximated) 
p-value.
I know that doing so can attract a lot of criticism in this list, but 
when you are fitting a several million models (10000s of transcripts 
and 1000s of genomic positions as in my case), the mcmc approximation 
is (unfortunately) not computationally feasible.


Hope this helps!
JP



Paolo Innocenti wrote:

Dear Douglas and list,

I am thinking about fitting a mixed model for a microarray experiment 
using lme4, since other specific software seems not suitable for my 
purposes. I'll briefly describe the model and kindly ask for 
suggestions on the model and the workflow I can use to get useful 
results.

My response variable Y is gene expression levels for a given gene 
(say g_i) from 120 samples.
The factor I want to include are:

Sex: fixed, two levels, M/F.
Line: 15 randomly picked genotypes from a large outbred population.

I am interested in:
- if the gene is differentially expressed in the 2 sexes (effect of 
sex), in the 15 lines (effect of line) and the interaction of the two.

- the variance component of line = how much of the variance is due to 
the genotype

- the variance component of the interaction = the genetic variation 
for sexual dimorphism.

Reading a bit of this mailing list, I came up with these three models:

m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))

or

m2 <- lmer(Y1 ~ sex + (sex|line))

or

m3 <- lmer(Y1 ~ sex + (0 + sex|line))

Which should all be the same model (and indeed they have all the same 
residuals) but different parametrization (see self-contained example 
below).

Now, in the light of my needs (see above), which model makes it 
easier to extract the components I need? Also, do they make different 
assumptions - as different levels of independency among levels of 
random factors?

I will need to be able to extract the variance component values in an 
iterative process (i have 18.000 genes): is VarCorr() the way to go?

VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]

Last two question: what is the easier way to assess, in an iterative 
process, normality of residuals, and what is a sensible way to assess 
significant differential expression of genes (since I guess I can't 
get p-values and then apply FDR correction?)

Thanks a lot for reading so far and I'll be grateful for any kind of 
help.
paolo

Self-contained example:

Y1 <- as.numeric(
c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
"11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
"11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
"12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
"11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
"11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
"11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
"11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
"11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
"12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
"12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
"11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
"12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
"11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
"11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
"11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
"11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
"11.5300"))
sex <- factor(rep(c("F","M"), 15, each=4))
line <- factor(rep(1:15, each=8))
m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
m2 <- lmer(Y1 ~ sex + (sex|line))
m3 <- lmer(Y1 ~ sex + (0 + sex|line))
VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]

Output:

m1

Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 | 
sex:line)     AIC   BIC logLik deviance REMLdev
 -91.13 -77.2  50.57   -111.1  -101.1
Random effects:
 Groups   Name        Variance  Std.Dev.
 sex:line (Intercept) 0.0023237 0.048205
 line     (Intercept) 0.0169393 0.130151
 Residual             0.0168238 0.129707
Number of obs: 120, groups: sex:line, 30; line, 15

Fixed effects:
            Estimate Std. Error t value
(Intercept) 11.45977    0.03955  289.72
sexM         0.52992    0.02951   17.96

Correlation of Fixed Effects:
     (Intr)
sexM -0.373

m2

Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line) 
 AIC    BIC logLik deviance REMLdev
 -90 -73.27     51   -112.1    -102
Random effects:
 Groups   Name        Variance  Std.Dev. Corr   line     (Intercept) 
0.0152993 0.123691                 sexM        0.0046474 0.068172 
0.194  Residual             0.0168238 0.129707       Number of obs: 
120, groups: line, 15

Fixed effects:
            Estimate Std. Error t value
(Intercept) 11.45977    0.03606   317.8
sexM         0.52992    0.02951    18.0

Correlation of Fixed Effects:
     (Intr)
sexM -0.161

m3

Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line) 
 AIC    BIC logLik deviance REMLdev
 -90 -73.27     51   -112.1    -102
Random effects:
 Groups   Name Variance Std.Dev. Corr   line     sexF 0.015299 
0.12369                  sexM 0.023227 0.15240  0.899  Residual      
0.016824 0.12971        Number of obs: 120, groups: line, 15

Fixed effects:
            Estimate Std. Error t value
(Intercept) 11.45977    0.03606   317.8
sexM         0.52991    0.02951    18.0

Correlation of Fixed Effects:
     (Intr)
sexM -0.161

Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyv?gen 18D
75236 Uppsala, Sweden

Mixed model (with interaction) for gene expression and iteration

Thread (9 messages)