Greetings DESeq2ers,
I've been playing around with different ways to encode a 2x2 factorial
design I'm working with: 4 cell types, 4 treatments and exploring
what's cooking in there.
I thought that the nested interaction formula used in the limma user's
guide (section 8.5.3) would be an easy way to explore some obvious
questions since differential expression from different treatments
within each cell type (among other things) shake out quite cleanly
from the results of running the wald test since these end up as
columns in the design matrix, eg:
~ cell + cell:treatment
as long as the control treatment is the first level of the treatment
factor, we got columns for all cell:treatment effects, eg.
cellA:treatment1, cellA:treatment2, ..., cellD:treatment4)
Yay!
... almost ...
The problem is that the "renaming column" stuff in the fitNbinomGLMs
function seems to assume that we'll have columns for all main effects
in the design matrix, so the following line triggers an error:
modelMatrixNames[match(convertNames$from, modelMatrixNames)] <- convertNames$to
since you will find elements like "treatment2, ..., treatment4" among
the elements in `covertNames$from`. The call to match then returns NA
for these, and *boom*.
An easy fix would be to add the following line after the call to
`renameModelMatrixColumns`:
convertNames <- subset(convertNames, from %in% modelMatrixNames)
So the "if (renameCols)" block (line 1066 in core.R) now becomes:
if (renameCols) {
names(modelMatrixNames) <- modelMatrixNames
convertNames <- renameModelMatrixColumns(modelMatrixNames,
as.data.frame(colData(object)),
modelFormula)
convertNames <- subset(convertNames, from %in% modelMatrixNames)
modelMatrixNames[match(convertNames$from, modelMatrixNames)] <-
convertNames$to
}
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
[Bioc-devel] renameModelMatrixColumns mishap + patch for DESeq2
8 messages · Steve Lianoglou, Kasper Daniel Hansen, Michael Love
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130709/8e315b6d/attachment.pl>
Hi Kasper, On Tue, Jul 9, 2013 at 5:51 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
This is side-stepping the question, but I am not aware that it ever makes sense to include the "A:B" term in a design matrix without also including the main effects of A and B (here I include obvious extensions such as A + A:B + C where B is a coarser factor than C, so here the main effects of B are essentially included). Of course, the A+A:B+C example seems to also fail in the DESeq2 code you quote, but these examples are rare in comp bio. If you're just fitting a model like A + A:B you're almost certainly doing something wrong from a statistical point of view.
Thanks for the input/guidance -- I'm always happy to get some linear-modeling-schooling. If that's the case, does that mean that I'm interpreting the advice from the limma user's guide incorrectly? Page 44 in the "Nested Interaction Formula" section: http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf Aren't the coefficients extracted for cellA:treatment1, cellA:treatment2, ..., cellD:treatment4 that you get when modeling this way exactly the fold changes for the effect of the treatment within each cell type? Thanks, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130709/d7ca6150/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130710/856c4e09/attachment.pl>
1 day later
Hey Mike, On Tue, Jul 9, 2013 at 11:52 PM, Michael Love
<michaelisaiahlove at gmail.com> wrote:
Hi Steve, Thanks for pointing out this bug in the renaming code. I will fix this as soon as possible.
If you decide to go the `subset(convertNames, ...)` route that I suggested: (1) You might want to put this in the renameModelMatrixColumns itself instead of where I suggested; and (2) be sure to guard the `subset` against the case when renameModelMatrixColumns return a 0 row data.frame, which is the case when you set design(dds) <- ~ 1. I just got bit by this when I was using the default params of the various variance stabilizing transforms. Anyway, I'm sure you'll find a better way to handle this stuff, but just wanted to give you advanced warning. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
3 days later
hi Steve, I have fixed this bug in version 1.0.18 and 1.1.22, and I added a unit test for such a design formula in the devel version. best, Mike On Wed, Jul 10, 2013 at 8:52 AM, Michael Love
<michaelisaiahlove at gmail.com> wrote:
Hi Steve, Thanks for pointing out this bug in the renaming code. I will fix this as soon as possible. Best, Mike On Jul 10, 2013 1:00 AM, "Steve Lianoglou" <lianoglou.steve at gene.com> wrote:
Greetings DESeq2ers,
I've been playing around with different ways to encode a 2x2 factorial
design I'm working with: 4 cell types, 4 treatments and exploring
what's cooking in there.
I thought that the nested interaction formula used in the limma user's
guide (section 8.5.3) would be an easy way to explore some obvious
questions since differential expression from different treatments
within each cell type (among other things) shake out quite cleanly
from the results of running the wald test since these end up as
columns in the design matrix, eg:
~ cell + cell:treatment
as long as the control treatment is the first level of the treatment
factor, we got columns for all cell:treatment effects, eg.
cellA:treatment1, cellA:treatment2, ..., cellD:treatment4)
Yay!
... almost ...
The problem is that the "renaming column" stuff in the fitNbinomGLMs
function seems to assume that we'll have columns for all main effects
in the design matrix, so the following line triggers an error:
modelMatrixNames[match(convertNames$from, modelMatrixNames)] <-
convertNames$to
since you will find elements like "treatment2, ..., treatment4" among
the elements in `covertNames$from`. The call to match then returns NA
for these, and *boom*.
An easy fix would be to add the following line after the call to
`renameModelMatrixColumns`:
convertNames <- subset(convertNames, from %in% modelMatrixNames)
So the "if (renameCols)" block (line 1066 in core.R) now becomes:
if (renameCols) {
names(modelMatrixNames) <- modelMatrixNames
convertNames <- renameModelMatrixColumns(modelMatrixNames,
as.data.frame(colData(object)),
modelFormula)
convertNames <- subset(convertNames, from %in% modelMatrixNames)
modelMatrixNames[match(convertNames$from, modelMatrixNames)] <-
convertNames$to
}
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
Thanks, Mike! On Mon, Jul 15, 2013 at 8:04 AM, Michael Love
<michaelisaiahlove at gmail.com> wrote:
hi Steve, I have fixed this bug in version 1.0.18 and 1.1.22, and I added a unit test for such a design formula in the devel version. best, Mike On Wed, Jul 10, 2013 at 8:52 AM, Michael Love <michaelisaiahlove at gmail.com> wrote:
Hi Steve, Thanks for pointing out this bug in the renaming code. I will fix this as soon as possible. Best, Mike On Jul 10, 2013 1:00 AM, "Steve Lianoglou" <lianoglou.steve at gene.com> wrote:
Greetings DESeq2ers,
I've been playing around with different ways to encode a 2x2 factorial
design I'm working with: 4 cell types, 4 treatments and exploring
what's cooking in there.
I thought that the nested interaction formula used in the limma user's
guide (section 8.5.3) would be an easy way to explore some obvious
questions since differential expression from different treatments
within each cell type (among other things) shake out quite cleanly
from the results of running the wald test since these end up as
columns in the design matrix, eg:
~ cell + cell:treatment
as long as the control treatment is the first level of the treatment
factor, we got columns for all cell:treatment effects, eg.
cellA:treatment1, cellA:treatment2, ..., cellD:treatment4)
Yay!
... almost ...
The problem is that the "renaming column" stuff in the fitNbinomGLMs
function seems to assume that we'll have columns for all main effects
in the design matrix, so the following line triggers an error:
modelMatrixNames[match(convertNames$from, modelMatrixNames)] <-
convertNames$to
since you will find elements like "treatment2, ..., treatment4" among
the elements in `covertNames$from`. The call to match then returns NA
for these, and *boom*.
An easy fix would be to add the following line after the call to
`renameModelMatrixColumns`:
convertNames <- subset(convertNames, from %in% modelMatrixNames)
So the "if (renameCols)" block (line 1066 in core.R) now becomes:
if (renameCols) {
names(modelMatrixNames) <- modelMatrixNames
convertNames <- renameModelMatrixColumns(modelMatrixNames,
as.data.frame(colData(object)),
modelFormula)
convertNames <- subset(convertNames, from %in% modelMatrixNames)
modelMatrixNames[match(convertNames$from, modelMatrixNames)] <-
convertNames$to
}
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech