[Bioc-devel] A geneSet data class for facilitating GSEA (Robert Gentleman)
Hi, Just an update on this, and a request for anyone who has done a bit more thinking to speak up in the next few days. We are planning to make the development of this class one of the exercises at the Lausanne developer conference - the schedule is still evolving, but is at: http://wiki.fhcrc.org/bioc/Lausanne_Dev_Meeting_2007_plans I think we should try to create a GSEAClasses package that could be used by anyone that is developing/working in the area that would be in BioC devel after this release. I doubt we will get consensus so quickly as to make it into the release (about 1 month away now). best wishes Robert
Tarca, Adi wrote:
Hi, I wonder if the direction of change will be of any use here. Firstly because a gene set should be independent of a particular experiment. Secondly, one can define the two groups in the order he wants so "UP" and "DOWN" will be confusing. Adi Tarca
________________________________
From: bioc-devel-bounces at stat.math.ethz.ch on behalf of bioc-devel-request at stat.math.ethz.ch
Sent: Sat 3/17/2007 7:00 AM
To: bioc-devel at stat.math.ethz.ch
Subject: Bioc-devel Digest, Vol 36, Issue 12
Send Bioc-devel mailing list submissions to
bioc-devel at stat.math.ethz.ch
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioc-devel
or, via email, send a message with subject or body 'help' to
bioc-devel-request at stat.math.ethz.ch
You can reach the person managing the list at
bioc-devel-owner at stat.math.ethz.ch
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioc-devel digest..."
Today's Topics:
1. Re: A geneSet data class for facilitating GSEA (Robert Gentleman)
----------------------------------------------------------------------
Message: 1
Date: Fri, 16 Mar 2007 06:18:43 -0700
From: Robert Gentleman <rgentlem at fhcrc.org>
Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
To: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
Cc: bioc-devel at stat.math.ethz.ch
Message-ID: <45FA9933.5030108 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,
Vincent Carey 525-2265 wrote:
Dear bioc-developers,
would it be useful to introduce an additional slot for the direction and/or
magnitude of expression change of each gene in the gene set?
My understanding is that we are currently trying to get a
structure that identifies a group of genes in a coherent way.
Connecting a group of genes to a specific experimental result is outside
the scope of this task.
That is a good question, but I would like to point out that it is
almost surely the case that notions of direction of change, and
magnitude are with respect to a comparison of phenotype (eg disease to
healthy, or stage I vs stage IV) and hence are not properties of the
gene set.
While that information is important and useful in a particular
analysis, it should not be stored with the gene set, in my opinion. We
will need some easy way for users to specify it and use it in practice,
but as Vince has said, it is probably not what we want here.
Designing an extension of the group class that incorporates
qualitative or quantitative information on gene behaviors under
certain conditions seems worthwhile but should be kept separate
from the original design problem -- I think.
It seems that GSEA and GSEA-like methods use sets of genes that are
homogeneously down- or upregulated (correct me if I am wrong, I am far from
being up to date on GSEA methods).
This seems to be reflected in the example presented in the PGSEA vignette
where target genes of Ras and Myc are separated into 'UP' and 'DN' regulated
genes.
Hopefully we will use UP and DOWN, the savings by using
abbreviations are almost never worth it, especially when for many users
English is not their first language.
best wishes
Robert
However, (alternative?) methods could actually use the quantitative
information about expression changes to score each gene set. Adding a
corresponding slot in the geneSet class would allow to accommodate such
methods.
Best,
Alexandre
-----Original Message-----
From: bioc-devel-bounces at stat.math.ethz.ch
[mailto:bioc-devel-bounces at stat.math.ethz.ch] On Behalf Of Dykema, Karl
Sent: mercredi, 14. mars 2007 16:15
To: bioc-devel at stat.math.ethz.ch
Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
Sorry I forgot to attach the str()
$ 15-delta prostaglandin J2 10 uM DOWN : list()
..- attr(*, "reference")= chr "15-delta prostaglandin J2 10 uM DOWN "
..- attr(*, "desc")= chr "DOWN "
..- attr(*, "source")= chr "PubMed"
..- attr(*, "design")= chr "????"
..- attr(*, "identifier")= chr "17008526"
..- attr(*, "species")= chr "human"
..- attr(*, "data")= chr "raw"
..- attr(*, "private")= chr "no"
..- attr(*, "creator")= chr "Karl Dykema <karl.dykema at vai.org>"
..- attr(*, "ids")= chr [1:75] "171392" "5680" "2149" "54557" ...
..- attr(*, "class")= atomic [1:1] smc
.. ..- attr(*, "package")= chr "PGSEA"
This closely mirrors the geneSet proposed and we will be happy to adopt
a consensus structure.
The only significant difference is a "creator" to let folk know who
curated the gene list... This may help if groups are collaborating to
the collect gene sets.
-------------------------------
Karl Dykema
Bioinformatics Programmer/Analyst
Laboratory of Computational Biology
Van Andel Research Institute
333 Bostwick Ave. NE
Grand Rapids, MI 49503
(616) 234-5554
-----Original Message-----
From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT)
To: Sean Davis <sdavis2 at mail.nih.gov>
Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus
<rerla at channing.harvard.edu>
Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
i like this idea in principle. the RGenetics folks may have done
something in this direction.
you might want to have geneList as an abstract class, and then extend to
EntrezGeneList, RefseqGeneList and so forth so that dispatch could work
without looking into the idType ...
a version or date field might also be important
---
Vince Carey, PhD
Assoc. Prof Med (Biostatistics)
Harvard Medical School
Channing Laboratory - ph 6175252265 fa 6177311541
181 Longwood Ave Boston MA 02115 USA
stvjc at channing.harvard.edu
On Wed, 14 Mar 2007, Sean Davis wrote:
GSEA, both the specific method and the general concept, is becoming
more prevalent and important in data analysis. There have been
several mentions of including various "gene lists" for use with
Category or other methods. Is there interest in making a generic
geneSet class for storing such information? (Or does it already exist
and I just haven't seen it?) I bring this up because I think it could
be quite useful to have a general solution for the community (like the
eSet class has become). A class could be as simple as a vector of
Entrez Gene IDs to something more complicated (but perhaps a bit more
useful for general consumption) like:
identifier: an identifier for the set (perhaps from a public database
like
MSigDB)
title: One line title
description: free text description
species: The species to which the dataset applies
URL: from where the data were derived
MIAME: class "MIAME" object
protocol: (could be in MIAME, also) description of methods to produce
genelist from raw data source
idType: What type of ID is stored (Entrez, Refseq, Ensembl, etc)?
geneList: vector of IDs
A simple wrapper data structure (even just a list) could then be used
to distribute the geneSets. Some methods could then be defined for
converting to an incidence matrix for use by Category, etc. But I
think the most important points from above are 1) maintaining some
metadata about the genelists and 2) standardization to reduce
duplicated work. Individual groups would then instantiate the
geneSets using whatever means they see fit (parsing MSigDB, IPI files,
etc.).
Any thoughts?
Sean
_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message, including any attachments, is for the so...{{dropped}}
_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
------------------------------
_______________________________________________
Bioc-devel mailing list
Bioc-devel at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioc-devel
End of Bioc-devel Digest, Vol 36, Issue 12
******************************************
------------------------------------------------------------------------
_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org