Back to formatted view
Raw Message

Message-ID: <40e66e0b0906021254ta8463a2p71e03f05dfa8d453@mail.gmail.com>
Date: 2009-06-02T19:54:31Z
From: Douglas Bates
Subject: Mixed model (with interaction) for gene expression and iteration
In-Reply-To: <4A2520D6.6080108@ebc.uu.se>

On Tue, Jun 2, 2009 at 7:53 AM, Paolo Innocenti
<paolo.innocenti at ebc.uu.se> wrote:
> Dear Douglas and list,

> I am thinking about fitting a mixed model for a microarray experiment using
> lme4, since other specific software seems not suitable for my purposes. I'll
> briefly describe the model and kindly ask for suggestions on the model and
> the workflow I can use to get useful results.

> My response variable Y is gene expression levels for a given gene (say g_i)
> from 120 samples.
> The factor I want to include are:

> Sex: fixed, two levels, M/F.
> Line: 15 randomly picked genotypes from a large outbred population.

> I am interested in:
> - if the gene is differentially expressed in the 2 sexes (effect of sex), in
> the 15 lines (effect of line) and the interaction of the two.

> - the variance component of line = how much of the variance is due to the
> genotype

> - the variance component of the interaction = the genetic variation for
> sexual dimorphism.

> Reading a bit of this mailing list, I came up with these three models:

> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))

> or

> m2 <- lmer(Y1 ~ sex + (sex|line))

> or

> m3 <- lmer(Y1 ~ sex + (0 + sex|line))

> Which should all be the same model (and indeed they have all the same
> residuals) but different parametrization (see self-contained example below).

Not really.  m2 and m3 are different parameterizations of the same
model but m1 is different.  In models m2 and m3 the random effects for
F and for M are allowed to be correlated within a line.  Thus there
are three parameters determining the variance-covariance of the random
effects.  In model m1 the unconditional distribution of the random
effects for the sex:line interaction is independent of the
distribution of the random effects for line.

> Now, in the light of my needs (see above), which model makes it easier to
> extract the components I need? Also, do they make different assumptions - as
> different levels of independency among levels of random factors?

I would use either m1 or m3 because I find the correlation structure
in m3 easier to understand than that of m2.

> I will need to be able to extract the variance component values in an
> iterative process (i have 18.000 genes): is VarCorr() the way to go?

I think so.  I would definitely look into using refit to update the
fitted model rather than starting from scratch with each of the genes.

I would use the idiom

> unlist(VarCorr(m1))
   sex:line        line
0.002323722 0.016939242

to get the estimated variances.

> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Last two question: what is the easier way to assess, in an iterative
> process, normality of residuals, and what is a sensible way to assess
> significant differential expression of genes (since I guess I can't get
> p-values and then apply FDR correction?)
>
> Thanks a lot for reading so far and I'll be grateful for any kind of help.
> paolo
>
> Self-contained example:
>
> Y1 <- as.numeric(
> c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
> "11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
> "11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
> "12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
> "11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
> "11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
> "11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
> "11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
> "11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
> "12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
> "12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
> "11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
> "12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
> "11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
> "11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
> "11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
> "11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
> "11.5300"))
> sex <- factor(rep(c("F","M"), 15, each=4))
> line <- factor(rep(1:15, each=8))
> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
> m2 <- lmer(Y1 ~ sex + (sex|line))
> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Output:
>
>>> m1
>>
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 |
>> sex:line) ? ?AIC ? BIC logLik deviance REMLdev
>> ?-91.13 -77.2 ?50.57 ? -111.1 ?-101.1
>> Random effects:
>> ?Groups ? Name ? ? ? ?Variance ?Std.Dev.
>> ?sex:line (Intercept) 0.0023237 0.048205
>> ?line ? ? (Intercept) 0.0169393 0.130151
>> ?Residual ? ? ? ? ? ? 0.0168238 0.129707
>> Number of obs: 120, groups: sex:line, 30; line, 15
>>
>> Fixed effects:
>> ? ? ? ? ? ?Estimate Std. Error t value
>> (Intercept) 11.45977 ? ?0.03955 ?289.72
>> sexM ? ? ? ? 0.52992 ? ?0.02951 ? 17.96
>>
>> Correlation of Fixed Effects:
>> ? ? (Intr)
>> sexM -0.373
>>>
>>> m2
>>
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line) ?AIC
>> ?BIC logLik deviance REMLdev
>> ?-90 -73.27 ? ? 51 ? -112.1 ? ?-102
>> Random effects:
>> ?Groups ? Name ? ? ? ?Variance ?Std.Dev. Corr ??line ? ? (Intercept)
>> 0.0152993 0.123691 ? ? ? ? ? ? ? ?sexM ? ? ? ?0.0046474 0.068172 0.194
>> ?Residual ? ? ? ? ? ? 0.0168238 0.129707 ? ? ? Number of obs: 120, groups:
>> line, 15
>>
>> Fixed effects:
>> ? ? ? ? ? ?Estimate Std. Error t value
>> (Intercept) 11.45977 ? ?0.03606 ? 317.8
>> sexM ? ? ? ? 0.52992 ? ?0.02951 ? ?18.0
>>
>> Correlation of Fixed Effects:
>> ? ? (Intr)
>> sexM -0.161
>>>
>>> m3
>>
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line) ?AIC
>> ?BIC logLik deviance REMLdev
>> ?-90 -73.27 ? ? 51 ? -112.1 ? ?-102
>> Random effects:
>> ?Groups ? Name Variance Std.Dev. Corr ??line ? ? sexF 0.015299 0.12369
>> ? ? ? ? ? ? ?sexM 0.023227 0.15240 ?0.899 ?Residual ? ? ?0.016824 0.12971
>> ? ? ?Number of obs: 120, groups: line, 15
>>
>> Fixed effects:
>> ? ? ? ? ? ?Estimate Std. Error t value
>> (Intercept) 11.45977 ? ?0.03606 ? 317.8
>> sexM ? ? ? ? 0.52991 ? ?0.02951 ? ?18.0
>>
>> Correlation of Fixed Effects:
>> ? ? (Intr)
>> sexM -0.161
>
>
>
> --
> Paolo Innocenti
> Department of Animal Ecology, EBC
> Uppsala University
> Norbyv?gen 18D
> 75236 Uppsala, Sweden
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>