Mixed model (with interaction) for gene expression and iteration
On Tue, Jun 2, 2009 at 7:53 AM, Paolo Innocenti
<paolo.innocenti at ebc.uu.se> wrote:
Dear Douglas and list,
I am thinking about fitting a mixed model for a microarray experiment using lme4, since other specific software seems not suitable for my purposes. I'll briefly describe the model and kindly ask for suggestions on the model and the workflow I can use to get useful results.
My response variable Y is gene expression levels for a given gene (say g_i) from 120 samples. The factor I want to include are:
Sex: fixed, two levels, M/F. Line: 15 randomly picked genotypes from a large outbred population.
I am interested in: - if the gene is differentially expressed in the 2 sexes (effect of sex), in the 15 lines (effect of line) and the interaction of the two.
- the variance component of line = how much of the variance is due to the genotype
- the variance component of the interaction = the genetic variation for sexual dimorphism.
Reading a bit of this mailing list, I came up with these three models:
m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
or
m2 <- lmer(Y1 ~ sex + (sex|line))
or
m3 <- lmer(Y1 ~ sex + (0 + sex|line))
Which should all be the same model (and indeed they have all the same residuals) but different parametrization (see self-contained example below).
Not really. m2 and m3 are different parameterizations of the same model but m1 is different. In models m2 and m3 the random effects for F and for M are allowed to be correlated within a line. Thus there are three parameters determining the variance-covariance of the random effects. In model m1 the unconditional distribution of the random effects for the sex:line interaction is independent of the distribution of the random effects for line.
Now, in the light of my needs (see above), which model makes it easier to extract the components I need? Also, do they make different assumptions - as different levels of independency among levels of random factors?
I would use either m1 or m3 because I find the correlation structure in m3 easier to understand than that of m2.
I will need to be able to extract the variance component values in an iterative process (i have 18.000 genes): is VarCorr() the way to go?
I think so. I would definitely look into using refit to update the fitted model rather than starting from scratch with each of the genes. I would use the idiom
unlist(VarCorr(m1))
sex:line line 0.002323722 0.016939242 to get the estimated variances.
VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]
Last two question: what is the easier way to assess, in an iterative
process, normality of residuals, and what is a sensible way to assess
significant differential expression of genes (since I guess I can't get
p-values and then apply FDR correction?)
Thanks a lot for reading so far and I'll be grateful for any kind of help.
paolo
Self-contained example:
Y1 <- as.numeric(
c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
"11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
"11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
"12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
"11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
"11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
"11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
"11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
"11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
"12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
"12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
"11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
"12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
"11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
"11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
"11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
"11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
"11.5300"))
sex <- factor(rep(c("F","M"), 15, each=4))
line <- factor(rep(1:15, each=8))
m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
m2 <- lmer(Y1 ~ sex + (sex|line))
m3 <- lmer(Y1 ~ sex + (0 + sex|line))
VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]
Output:
m1
Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 | sex:line) ? ?AIC ? BIC logLik deviance REMLdev ?-91.13 -77.2 ?50.57 ? -111.1 ?-101.1 Random effects: ?Groups ? Name ? ? ? ?Variance ?Std.Dev. ?sex:line (Intercept) 0.0023237 0.048205 ?line ? ? (Intercept) 0.0169393 0.130151 ?Residual ? ? ? ? ? ? 0.0168238 0.129707 Number of obs: 120, groups: sex:line, 30; line, 15 Fixed effects: ? ? ? ? ? ?Estimate Std. Error t value (Intercept) 11.45977 ? ?0.03955 ?289.72 sexM ? ? ? ? 0.52992 ? ?0.02951 ? 17.96 Correlation of Fixed Effects: ? ? (Intr) sexM -0.373
m2
Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line) ?AIC ?BIC logLik deviance REMLdev ?-90 -73.27 ? ? 51 ? -112.1 ? ?-102 Random effects: ?Groups ? Name ? ? ? ?Variance ?Std.Dev. Corr ??line ? ? (Intercept) 0.0152993 0.123691 ? ? ? ? ? ? ? ?sexM ? ? ? ?0.0046474 0.068172 0.194 ?Residual ? ? ? ? ? ? 0.0168238 0.129707 ? ? ? Number of obs: 120, groups: line, 15 Fixed effects: ? ? ? ? ? ?Estimate Std. Error t value (Intercept) 11.45977 ? ?0.03606 ? 317.8 sexM ? ? ? ? 0.52992 ? ?0.02951 ? ?18.0 Correlation of Fixed Effects: ? ? (Intr) sexM -0.161
m3
Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line) ?AIC ?BIC logLik deviance REMLdev ?-90 -73.27 ? ? 51 ? -112.1 ? ?-102 Random effects: ?Groups ? Name Variance Std.Dev. Corr ??line ? ? sexF 0.015299 0.12369 ? ? ? ? ? ? ?sexM 0.023227 0.15240 ?0.899 ?Residual ? ? ?0.016824 0.12971 ? ? ?Number of obs: 120, groups: line, 15 Fixed effects: ? ? ? ? ? ?Estimate Std. Error t value (Intercept) 11.45977 ? ?0.03606 ? 317.8 sexM ? ? ? ? 0.52991 ? ?0.02951 ? ?18.0 Correlation of Fixed Effects: ? ? (Intr) sexM -0.161
-- Paolo Innocenti Department of Animal Ecology, EBC Uppsala University Norbyv?gen 18D 75236 Uppsala, Sweden
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models