Skip to content
Prev 968 / 21312 Next

[Bioc-devel] A geneSet data class for facilitating GSEA

On Wednesday 14 March 2007 12:59, Seth Falcon wrote:
I agree that these are all close.  I was thinking of keeping the collections 
as a separate higher-level data structure.  However, an email off-list I got 
suggested that a geneSet could be composed of a set of ID's OR another set of 
geneSets.  A collection would then be a set of geneSets that are related in 
some way.  The interpretation is straightforward--a geneSet becomes the union 
of all unique IDs in the contained geneSets.  So a maintainer could choose to 
code chr16q as a combination of all the geneSets for the bands of 16q, or 
simply make one large vector of IDs.  Either would be work for downstream 
processing.  What is more problematic is an API for getting at individual 
geneSets (I want 16q24, but how do I get there if I need to go through chr16 
and 16q24) embedded in a higher-level set in such a setup.  

I'm inclined to think that hierarchical geneSets might be too complicated to 
want to deal with, but Seth and the Bioc folks would know best.
I agree.  The one point that Vince's email makes, though, is that it would be 
necessary to standardize the nomenclature for the various gene ID types if 
there is any hope of introducing "smarts" in dealing with translation.  One 
way is to subclass, but the other is to validate any idType slot with 
agreed-upon types.
Sounds great.

Sean