Skip to content
Prev 967 / 21312 Next

[Bioc-devel] A geneSet data class for facilitating GSEA

Hi,
I also think this is a good idea and is something we (BioC Seattle
group) are wiling to help with.

It looks like the class defined in the soon-to-be-in-devel PGSEA
package is very close to what is wanted.  Having had a brief look at
PGSEA it looks like a delimited format is defined for reading/writing
gene set objects.

Since the gene sets on the Broad's website__ already provide a simple
XML format, I think it would be nice to be able to read and write that
format.  And we should make sure we have corresponding slots for the
fields they use:

    Standard name                   # name of set
    LSID                            # ID of set
    Brief description               
    Collection                      # collection ID
    Full description or Abstract	
    Publication URL
    External links
    Organism
    Contributed by
    Source platform
    Genes	

__ http://www.broad.mit.edu/gsea/msigdb/cards/chr16q24.html

I think the collection ID makes a lot of sense since some gene sets
are really sets of gene sets like GO and cytogenetic bands.  

One concern with this approach is that for sets of gene sets (again,
GO or cytogenetic bands) we will have a fair amount of duplication.
But I'm not sure it will be a problem.

I'm not sure yet whether ID-type specific subclasses will make things
easier or not.  I am certain that we will be able to add some smarts
to how the annotation is dealt with to allow at least some basic
translation between IDs such as Entrez and gene symbol.

Perhaps we should start a wiki page to hammer out a class definition?

Best Wishes,

+ seth
Message-ID: <m2r6rrg3qq.fsf@ziti.local>
In-Reply-To: <CEA39A213F7F2E44A0DED9210BCD352F023D0C64@VAIEXCH04.vai.org> (Karl Dykema's message of "Wed, 14 Mar 2007 11:14:49 -0400")