[Bioc-devel] A geneSet data class for facilitating GSEA
Dear bioc-developers, would it be useful to introduce an additional slot for the direction and/or magnitude of expression change of each gene in the gene set? It seems that GSEA and GSEA-like methods use sets of genes that are homogeneously down- or upregulated (correct me if I am wrong, I am far from being up to date on GSEA methods). This seems to be reflected in the example presented in the PGSEA vignette where target genes of Ras and Myc are separated into 'UP' and 'DN' regulated genes. However, (alternative?) methods could actually use the quantitative information about expression changes to score each gene set. Adding a corresponding slot in the geneSet class would allow to accommodate such methods. Best, Alexandre -----Original Message----- From: bioc-devel-bounces at stat.math.ethz.ch [mailto:bioc-devel-bounces at stat.math.ethz.ch] On Behalf Of Dykema, Karl Sent: mercredi, 14. mars 2007 16:15 To: bioc-devel at stat.math.ethz.ch Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA Sorry I forgot to attach the str() $ 15-delta prostaglandin J2 10 uM DOWN : list() ..- attr(*, "reference")= chr "15-delta prostaglandin J2 10 uM DOWN " ..- attr(*, "desc")= chr "DOWN " ..- attr(*, "source")= chr "PubMed" ..- attr(*, "design")= chr "????" ..- attr(*, "identifier")= chr "17008526" ..- attr(*, "species")= chr "human" ..- attr(*, "data")= chr "raw" ..- attr(*, "private")= chr "no" ..- attr(*, "creator")= chr "Karl Dykema <karl.dykema at vai.org>" ..- attr(*, "ids")= chr [1:75] "171392" "5680" "2149" "54557" ... ..- attr(*, "class")= atomic [1:1] smc .. ..- attr(*, "package")= chr "PGSEA" This closely mirrors the geneSet proposed and we will be happy to adopt a consensus structure. The only significant difference is a "creator" to let folk know who curated the gene list... This may help if groups are collaborating to the collect gene sets. ------------------------------- Karl Dykema Bioinformatics Programmer/Analyst Laboratory of Computational Biology Van Andel Research Institute 333 Bostwick Ave. NE Grand Rapids, MI 49503 (616) 234-5554 -----Original Message----- From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu> Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT) To: Sean Davis <sdavis2 at mail.nih.gov> Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus <rerla at channing.harvard.edu> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA i like this idea in principle. the RGenetics folks may have done something in this direction. you might want to have geneList as an abstract class, and then extend to EntrezGeneList, RefseqGeneList and so forth so that dispatch could work without looking into the idType ... a version or date field might also be important --- Vince Carey, PhD Assoc. Prof Med (Biostatistics) Harvard Medical School Channing Laboratory - ph 6175252265 fa 6177311541 181 Longwood Ave Boston MA 02115 USA stvjc at channing.harvard.edu
On Wed, 14 Mar 2007, Sean Davis wrote:
GSEA, both the specific method and the general concept, is becoming more prevalent and important in data analysis. There have been several mentions of including various "gene lists" for use with Category or other methods. Is there interest in making a generic geneSet class for storing such information? (Or does it already exist
and I just haven't seen it?) I bring this up because I think it could
be quite useful to have a general solution for the community (like the
eSet class has become). A class could be as simple as a vector of Entrez Gene IDs to something more complicated (but perhaps a bit more
useful for general consumption) like:
identifier: an identifier for the set (perhaps from a public database like MSigDB) title: One line title description: free text description species: The species to which the dataset applies URL: from where the data were derived MIAME: class "MIAME" object protocol: (could be in MIAME, also) description of methods to produce genelist from raw data source idType: What type of ID is stored (Entrez, Refseq, Ensembl, etc)? geneList: vector of IDs A simple wrapper data structure (even just a list) could then be used to distribute the geneSets. Some methods could then be defined for converting to an incidence matrix for use by Category, etc. But I think the most important points from above are 1) maintaining some metadata about the genelists and 2) standardization to reduce duplicated work. Individual groups would then instantiate the geneSets using whatever means they see fit (parsing MSigDB, IPI files,
etc.).
Any thoughts? Sean
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message, including any attachments, is for the so...{{dropped}} _______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel