Hi all, I was wondering if there is a class for results of differential gene expression analysis. I couldn?t find anything generic. Perhaps it?s too similar to a simple data frame, but I like the idea of having a consistent format. I would imagine something that holds gene names, statistics (logFC, p-value, adjusted p-value), plus optional information, e.g. the percent of cells expressing a gene (in the context of scRNA-seq). This could then be attached to an SCE object (?metadata" slot) to keep all results together. I?m probably making things too complicated and should just use a simple data frame but wanted to be sure that I?m not missing any existing solution. I?d appreciate if you share your advice. Thank you! Cheers, Roman
[Bioc-devel] Class for differentially expressed genes?
5 messages · Roman Hillje, Shepherd, Lori, Constantin Ahlmann-Eltze +1 more
I would imagine a SummarizedExperiment would be the best option https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html Lori Shepherd Bioconductor Core Team Roswell Park Comprehensive Cancer Center Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Roman Hillje via Bioc-devel <bioc-devel at r-project.org>
Sent: Monday, March 9, 2020 6:48 AM
To: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: [Bioc-devel] Class for differentially expressed genes?
Sent: Monday, March 9, 2020 6:48 AM
To: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: [Bioc-devel] Class for differentially expressed genes?
Hi all, I was wondering if there is a class for results of differential gene expression analysis. I couldn?t find anything generic. Perhaps it?s too similar to a simple data frame, but I like the idea of having a consistent format. I would imagine something that holds gene names, statistics (logFC, p-value, adjusted p-value), plus optional information, e.g. the percent of cells expressing a gene (in the context of scRNA-seq). This could then be attached to an SCE object (?metadata" slot) to keep all results together. I?m probably making things too complicated and should just use a simple data frame but wanted to be sure that I?m not missing any existing solution. I?d appreciate if you share your advice. Thank you! Cheers, Roman _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Hi Roman, I think it is probably also helpful to check out how DESeq2 and edgeR (two popular Bioconductor packages for differential expression analysis) have solved that problem: In DESeq2 for example the `nbinomWaldTest()` function calculates the differential expression and stores the results in the `rowData()` of the DESeqDataSet / SummarizedExperiment. The `results()` function extracts a standard `data.frame` with all the columns that you mentioned. In edgeR the `glmLRT()` function calculates differential expression with the likelihood ratio test and returns directly a `data.frame` with the mentioned columns. Best Regards, Constantin Am 09.03.20 um 12:44 schrieb Shepherd, Lori:
I would imagine a SummarizedExperiment would be the best option https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html Lori Shepherd Bioconductor Core Team Roswell Park Comprehensive Cancer Center Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
________________________________ From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Roman Hillje via Bioc-devel <bioc-devel at r-project.org> Sent: Monday, March 9, 2020 6:48 AM To: bioc-devel at r-project.org <bioc-devel at r-project.org> Subject: [Bioc-devel] Class for differentially expressed genes? Hi all, I was wondering if there is a class for results of differential gene expression analysis. I couldn?t find anything generic. Perhaps it?s too similar to a simple data frame, but I like the idea of having a consistent format. I would imagine something that holds gene names, statistics (logFC, p-value, adjusted p-value), plus optional information, e.g. the percent of cells expressing a gene (in the context of scRNA-seq). This could then be attached to an SCE object (?metadata" slot) to keep all results together. I?m probably making things too complicated and should just use a simple data frame but wanted to be sure that I?m not missing any existing solution. I?d appreciate if you share your advice. Thank you! Cheers, Roman _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Mon, Mar 9, 2020 at 6:58 AM Roman Hillje via Bioc-devel <
bioc-devel at r-project.org> wrote:
Hi all, I was wondering if there is a class for results of differential gene expression analysis. I couldn?t find anything generic. Perhaps it?s too similar to a simple data frame, but I like the idea of having a consistent format. I would imagine something that holds gene names, statistics (logFC, p-value, adjusted p-value), plus optional information, e.g. the percent of cells expressing a gene (in the context of scRNA-seq). This could then be attached to an SCE object (?metadata" slot) to keep all results together. I?m probably making things too complicated and should just use a simple data frame but wanted to be sure that I?m not missing any existing solution. I?d appreciate if you share your advice. Thank you!
IMHO it is always profitable to consider the methods desired before designing a class. My sense of how this has proceeded to date starts with limma: data+design -> lmFit -> eBayes -> topTable(contrast) ... schematically, this has been an adequate approach to generating and working with DE statistics for some time. For DESeq2 a function called results() is used to acquire statistics on DE. There are aspects of asking for and interpreting results DE analyses that could be made more systematic and perhaps shared among packages so that users have a more consistent and informative experience in this domain ... sketching out the key actions may be the place to start. Just my 2c.
Cheers, Roman
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
The information in this e-mail is intended only for the ...{{dropped:18}}
Thank you for the responses so far! The idea behind this is that I don?t want to limit users to a certain DGE toolkit, may it be DESeq2, edgeR or frameworks specifically developed for single cell data such as muscat. I?d like to have a common structure that users can pour their results into (most variables are probably generated by all methods), which ensures that it matches a certain format. Then, I can build the visualisation in a Shiny app around that format. I could just make my own format but I want to avoid complicated explanations of how the data frame must look like to be in the correct format. Also, knowing myself, if I have control over the format, I might get tempted to change it in the future resulting in compatibility issues... Regarding the SummarizedExperiment class: Would you suggest leaving most of the object empty and use the feature info slot (accessible through ?rowData()?)? From the vignette it looks like that's just a normal data frame. I?ll keep exploring. Best, Roman
On 9. Mar 2020, at 13:02, Constantin Ahlmann-Eltze <constantin.ahlmann at gmail.com> wrote: Hi Roman, I think it is probably also helpful to check out how DESeq2 and edgeR (two popular Bioconductor packages for differential expression analysis) have solved that problem: In DESeq2 for example the `nbinomWaldTest()` function calculates the differential expression and stores the results in the `rowData()` of the DESeqDataSet / SummarizedExperiment. The `results()` function extracts a standard `data.frame` with all the columns that you mentioned. In edgeR the `glmLRT()` function calculates differential expression with the likelihood ratio test and returns directly a `data.frame` with the mentioned columns. Best Regards, Constantin Am 09.03.20 um 12:44 schrieb Shepherd, Lori:
I would imagine a SummarizedExperiment would be the best option https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html Lori Shepherd Bioconductor Core Team Roswell Park Comprehensive Cancer Center Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
________________________________ From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Roman Hillje via Bioc-devel <bioc-devel at r-project.org> Sent: Monday, March 9, 2020 6:48 AM To: bioc-devel at r-project.org <bioc-devel at r-project.org> Subject: [Bioc-devel] Class for differentially expressed genes? Hi all, I was wondering if there is a class for results of differential gene expression analysis. I couldn?t find anything generic. Perhaps it?s too similar to a simple data frame, but I like the idea of having a consistent format. I would imagine something that holds gene names, statistics (logFC, p-value, adjusted p-value), plus optional information, e.g. the percent of cells expressing a gene (in the context of scRNA-seq). This could then be attached to an SCE object (?metadata" slot) to keep all results together. I?m probably making things too complicated and should just use a simple data frame but wanted to be sure that I?m not missing any existing solution. I?d appreciate if you share your advice. Thank you! Cheers, Roman _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel