Skip to content
Prev 3247 / 21307 Next

[Bioc-devel] GSEABase::mapIdentifiers() stopped mapping identifiers

hi, since we're approaching release and a few days happened since my last
email exchange about this, let me bring again what the problem is that
is breaking the GSVA vignette and thus, its build on R-devel with a
specific example reproducing the problem:

library(Biobase)
library(GSEABase)
library(GSVAdata)

data(c2BroadSets)
data(leukemia)
class(leukemia_eset)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"
annotation(leukemia_eset)
[1] "hgu95a"


gsc <- mapIdentifiers(c2BroadSets,
                      AnnotationIdentifier(annotation(leukemia_eset)))

head(lapply(geneIds(gsc), head))
$NAKAMURA_CANCER_MICROENVIRONMENT_UP
[1] "5167"      "100288400" "338328"    "388"       "10631"     "440387"   

$NAKAMURA_CANCER_MICROENVIRONMENT_DN
[1] "55215" "9319"  "81610" "9455"  "64759" "8767" 

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP
[1] "5142"   "6781"   "580"    "6713"   "112950" "11182" 

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_DN
[1] "125"  "2619" "5919" "4856" "5156" "4046"

$WINTER_HYPOXIA_UP
[1] "7022"   "404550" "5738"   "9456"   "5230"   "10856" 

$WINTER_HYPOXIA_DN
[1] "5168"  "9452"  "3112"  "91526" "55843" "9459" 

note that these identifiers are still the Entrez Gene Identifiers from
the GeneSetCollection object 'c2BroadSets' while they should have been
translated into hgu95a probeset IDs since these are the kind of identifiers
in the ExpressionSet object 'leukemia_eset'.

the vignette in GSVA breaks because no gene in these "mapped" gene sets
matches any probeset ID in the ExpressionSet. I think this could potentially
break the code in the vignette of other package using the function mapIdentifiers().

the most straightforward solution is to revert back to GSEABase 1.17.2 and then
find later a workaround for the original issue of idempotent maps between
GeneSetCollection and ExpressionSet objects
(see https://mailman.stat.ethz.ch/pipermail/bioc-devel/2012-February/003151.html)

sessionInfo()
R Under development (unstable) (2012-01-31 r58242)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GSVAdata_0.99.3       hgu95a.db_2.6.3       org.Hs.eg.db_2.6.4   
 [4] RSQLite_0.11.1        DBI_0.2-5             GSEABase_1.17.3      
 [7] graph_1.33.1          annotate_1.33.2       AnnotationDbi_1.17.27
[10] Biobase_2.15.4        BiocGenerics_0.1.14  

loaded via a namespace (and not attached):
[1] IRanges_1.13.27 stats4_2.15.0   tools_2.15.0    XML_3.9-4      
[5] xtable_1.7-0   

i'm pasting below our last communication about this issue.

thanks!
robert.

============================================================
hi Martin,
On Wed, 2012-03-14 at 15:58 -0700, Martin Morgan wrote:
[...]
so if you revert now to the former behavior of GSEABase, how can i
programmatically derive whether the outcome of

annotation(eset)

should be the input to

AnnotationIdentifer()

or to

EntrezIdentifier()

or to any other among the *Identifier() functions described in
class?GeneIdentifierType


??


thanks,
robert.

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel