Mixed model (with interaction) for gene expression and iteration

On Tue, Jun 2, 2009 at 7:53 AM, Paolo Innocenti
Dear Douglas and list,
I am thinking about fitting a mixed model for a microarray experiment using
lme4, since other specific software seems not suitable for my purposes. I'll
briefly describe the model and kindly ask for suggestions on the model and
the workflow I can use to get useful results.
My response variable Y is gene expression levels for a given gene (say g_i)
from 120 samples.
The factor I want to include are:
Sex: fixed, two levels, M/F.
Line: 15 randomly picked genotypes from a large outbred population.
I am interested in:
- if the gene is differentially expressed in the 2 sexes (effect of sex), in
the 15 lines (effect of line) and the interaction of the two.
- the variance component of line = how much of the variance is due to the
genotype
- the variance component of the interaction = the genetic variation for
sexual dimorphism.
Reading a bit of this mailing list, I came up with these three models:
m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
or
m2 <- lmer(Y1 ~ sex + (sex|line))
or
m3 <- lmer(Y1 ~ sex + (0 + sex|line))
Which should all be the same model (and indeed they have all the same
residuals) but different parametrization (see self-contained example below).
Not really.  m2 and m3 are different parameterizations of the same
model but m1 is different.  In models m2 and m3 the random effects for
F and for M are allowed to be correlated within a line.  Thus there
are three parameters determining the variance-covariance of the random
effects.  In model m1 the unconditional distribution of the random
effects for the sex:line interaction is independent of the
distribution of the random effects for line.
Now, in the light of my needs (see above), which model makes it easier to
extract the components I need? Also, do they make different assumptions - as
different levels of independency among levels of random factors?
I would use either m1 or m3 because I find the correlation structure
in m3 easier to understand than that of m2.
I will need to be able to extract the variance component values in an
iterative process (i have 18.000 genes): is VarCorr() the way to go?
I think so.  I would definitely look into using refit to update the
fitted model rather than starting from scratch with each of the genes.

I would use the idiom
unlist(VarCorr(m1))
sex:line        line
0.002323722 0.016939242

to get the estimated variances.
VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]

Last two question: what is the easier way to assess, in an iterative
process, normality of residuals, and what is a sensible way to assess
significant differential expression of genes (since I guess I can't get
p-values and then apply FDR correction?)

Thanks a lot for reading so far and I'll be grateful for any kind of help.
paolo

Self-contained example:

Y1 <- as.numeric(
c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
"11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
"11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
"12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
"11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
"11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
"11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
"11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
"11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
"12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
"12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
"11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
"12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
"11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
"11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
"11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
"11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
"11.5300"))
sex <- factor(rep(c("F","M"), 15, each=4))
line <- factor(rep(1:15, each=8))
m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
m2 <- lmer(Y1 ~ sex + (sex|line))
m3 <- lmer(Y1 ~ sex + (0 + sex|line))
VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]

Output:

m1
Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 |
sex:line) ? ?AIC ? BIC logLik deviance REMLdev
?-91.13 -77.2 ?50.57 ? -111.1 ?-101.1
Random effects:
?Groups ? Name ? ? ? ?Variance ?Std.Dev.
?sex:line (Intercept) 0.0023237 0.048205
?line ? ? (Intercept) 0.0169393 0.130151
?Residual ? ? ? ? ? ? 0.0168238 0.129707
Number of obs: 120, groups: sex:line, 30; line, 15

Fixed effects:
? ? ? ? ? ?Estimate Std. Error t value
(Intercept) 11.45977 ? ?0.03955 ?289.72
sexM ? ? ? ? 0.52992 ? ?0.02951 ? 17.96

Correlation of Fixed Effects:
? ? (Intr)
sexM -0.373
m2
Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line) ?AIC
?BIC logLik deviance REMLdev
?-90 -73.27 ? ? 51 ? -112.1 ? ?-102
Random effects:
?Groups ? Name ? ? ? ?Variance ?Std.Dev. Corr ??line ? ? (Intercept)
0.0152993 0.123691 ? ? ? ? ? ? ? ?sexM ? ? ? ?0.0046474 0.068172 0.194
?Residual ? ? ? ? ? ? 0.0168238 0.129707 ? ? ? Number of obs: 120, groups:
line, 15

Fixed effects:
? ? ? ? ? ?Estimate Std. Error t value
(Intercept) 11.45977 ? ?0.03606 ? 317.8
sexM ? ? ? ? 0.52992 ? ?0.02951 ? ?18.0

Correlation of Fixed Effects:
? ? (Intr)
sexM -0.161
m3
Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line) ?AIC
?BIC logLik deviance REMLdev
?-90 -73.27 ? ? 51 ? -112.1 ? ?-102
Random effects:
?Groups ? Name Variance Std.Dev. Corr ??line ? ? sexF 0.015299 0.12369
? ? ? ? ? ? ?sexM 0.023227 0.15240 ?0.899 ?Residual ? ? ?0.016824 0.12971
? ? ?Number of obs: 120, groups: line, 15

Fixed effects:
? ? ? ? ? ?Estimate Std. Error t value
(Intercept) 11.45977 ? ?0.03606 ? 317.8
sexM ? ? ? ? 0.52991 ? ?0.02951 ? ?18.0

Correlation of Fixed Effects:
? ? (Intr)
sexM -0.161

--
Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyv?gen 18D
75236 Uppsala, Sweden

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Mixed model (with interaction) for gene expression and iteration

Thread (9 messages)