Skip to content

[Bioc-devel] Class for differentially expressed genes?

5 messages · Roman Hillje, Shepherd, Lori, Constantin Ahlmann-Eltze +1 more

#
Hi all,

I was wondering if there is a class for results of differential gene expression analysis. I couldn?t find anything generic. Perhaps it?s too similar to a simple data frame, but I like the idea of having a consistent format. I would imagine something that holds gene names, statistics (logFC, p-value, adjusted p-value), plus optional information, e.g. the percent of cells expressing a gene (in the context of scRNA-seq). This could then be attached to an SCE object (?metadata" slot) to keep all results together. I?m probably making things too complicated and should just use a simple data frame but wanted to be sure that I?m not missing any existing solution. I?d appreciate if you share your advice. Thank you!

Cheers,
Roman
#
I would imagine a SummarizedExperiment would be the best option
https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html




Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
Hi Roman,

I think it is probably also helpful to check out how DESeq2 and edgeR 
(two popular Bioconductor packages for differential expression analysis) 
have solved that problem:

In DESeq2 for example the `nbinomWaldTest()` function calculates the 
differential expression and stores the results in the `rowData()` of the 
DESeqDataSet / SummarizedExperiment. The `results()` function extracts a 
standard `data.frame` with all the columns that you mentioned.

In edgeR the `glmLRT()` function calculates differential expression with 
the likelihood ratio test and returns directly a `data.frame` with the 
mentioned columns.

Best Regards,
Constantin

Am 09.03.20 um 12:44 schrieb Shepherd, Lori:

  
  
#
On Mon, Mar 9, 2020 at 6:58 AM Roman Hillje via Bioc-devel <
bioc-devel at r-project.org> wrote:

            
IMHO it is always profitable to consider the methods desired before
designing a class.  My sense of how this has
proceeded to date starts with limma: data+design -> lmFit -> eBayes ->
topTable(contrast) ... schematically, this has been
an adequate approach to generating and working with DE statistics for some
time.  For DESeq2 a function called
results() is used to acquire statistics on DE.

There are aspects of asking for and interpreting results DE analyses that
could be made more systematic and
perhaps shared among packages so that users have a more consistent and
informative experience in this
domain ... sketching out the key actions may be the place to start.  Just
my 2c.

  
    
#
Thank you for the responses so far!

The idea behind this is that I don?t want to limit users to a certain DGE toolkit, may it be DESeq2, edgeR or frameworks specifically developed for single cell data such as muscat. I?d like to have a common structure that users can pour their results into (most variables are probably generated by all methods), which ensures that it matches a certain format. Then, I can build the visualisation in a Shiny app around that format. I could just make my own format but I want to avoid complicated explanations of how the data frame must look like to be in the correct format. Also, knowing myself, if I have control over the format, I might get tempted to change it in the future resulting in compatibility issues...

Regarding the SummarizedExperiment class: Would you suggest leaving most of the object empty and use the feature info slot (accessible through ?rowData()?)? From the vignette it looks like that's just a normal data frame.

I?ll keep exploring.

Best,
Roman