Hi guys,
here is Luigi. I am developing a package which heavily relies on
AnnotationDbi, annotate, and metadatata packages from bioconductor.
I noticed I weird behavior of the get() function for the
mgug4122a.db() package moving from 2.9.1 to later versions (2.10.0 and
2.11.0 the one I am running).
Basically, an ID pointing to an 'NA' is not properly handled in the
newer versions, see below.
The same ID in one case returns 'NA', in the other case returns an
error.
If also checked if this happened with other package (hgu133a.db) and
the behavior is different, for both an ID that was there and now is
missing, and an ID that was not there, and now is not there as well....
I give you the details below. Do you have any idea of what is going on?
Luigi
########################################################
###FOR R-2.9.1:
########################################################
> sessionInfo()
R version 2.9.1 (2009-06-26)
i386-apple-darwin9.7.0
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] hgu133a.db_2.2.12 mgug4122a.db_2.2.11
[3] RSQLite_0.7-3 DBI_0.2-4
[5] AnnotationDbi_1.6.1 Biobase_2.4.1
> get("A_52_P71146", mgug4122aSYMBOL)
[1] NA
> get("201265_at", hgu133aSYMBOL)
[1] NA
> get("200080_s_at", hgu133aSYMBOL)
[1] "H3F3A"
########################################################
###FOR R-2.10.0:
########################################################
> sessionInfo()
R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] hgu133a.db_2.3.5 org.Hs.eg.db_2.3.6
[3] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6
[5] RSQLite_0.7-3 DBI_0.2-4
[7] AnnotationDbi_1.8.0 Biobase_2.6.0
> get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
value for "A_52_P71146" not found
> get("201265_at", hgu133aSYMBOL)
[1] NA
> get("200080_s_at", hgu133aSYMBOL)
[1] NA
--
Ulisse: "Considerate la vostra semenza:
fatti non foste a viver come bruti,
ma per seguir virtute e canoscenza".
(Dante, Divina Commedia, Canto XXVI)
--
G-C
T---A Luigi Marchionni, M.D., Ph.D.
C----G The Sidney Kimmel Comprehensive Cancer Center
G-------C Johns Hopkins University - School of Medicine
A------T 1550 Orleans St., CRB2, Rm 554
C----G Baltimore, MD, 21231, USA
G--C Tel: (001) 410-502-8179
C-G Fax: (001) 410-502-5742
T---A e-mail: marchion at jhmi.edu
G-----C URL: http://astor.som.jhmi.edu/~marchion/
A-------T
[Bioc-devel] Different behavior of get() between 2.9.0 and later versions`
7 messages · Luigi Marchionni, Robert Castelo, Seth Falcon +1 more
Hi Luigi,
On 11/3/09 8:37 PM, Luigi Marchionni wrote:
here is Luigi. I am developing a package which heavily relies on AnnotationDbi, annotate, and metadatata packages from bioconductor. I noticed I weird behavior of the get() function for the mgug4122a.db() package moving from 2.9.1 to later versions (2.10.0 and 2.11.0 the one I am running). Basically, an ID pointing to an 'NA' is not properly handled in the newer versions, see below. The same ID in one case returns 'NA', in the other case returns an error. If also checked if this happened with other package (hgu133a.db) and the behavior is different, for both an ID that was there and now is missing, and an ID that was not there, and now is not there as well.... I give you the details below. Do you have any idea of what is going on?
Partial details...
When you call get() or otherwise retrieve a value from an annotation
package object using a key, like a probe ID, there are three situations:
1. The probe ID is valid and maps to a value in the given object.
2. The probe ID is valid, but does not map to a value so NA is returned.
3. The probe ID is not valid, an error is raised.
So using a part of your example I see:
> get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
value for "A_52_P71146" not found
> "A_52_P71146" %in% keys(mgug4122aSYMBOL)
[1] FALSE
I'll let someone else take a crack at why the key sets would have changed.
+ seth
> sessionInfo()
R version 2.11.0 Under development (unstable) (--)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6 RSQLite_0.7-3
[4] DBI_0.2-4 AnnotationDbi_1.9.0 Biobase_2.5.8
loaded via a namespace (and not attached):
[1] tools_2.11.0
Luigi ######################################################## ###FOR R-2.9.1: ########################################################
> sessionInfo()
R version 2.9.1 (2009-06-26) i386-apple-darwin9.7.0 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] hgu133a.db_2.2.12 mgug4122a.db_2.2.11 [3] RSQLite_0.7-3 DBI_0.2-4 [5] AnnotationDbi_1.6.1 Biobase_2.4.1
> get("A_52_P71146", mgug4122aSYMBOL)
[1] NA
> get("201265_at", hgu133aSYMBOL)
[1] NA
> get("200080_s_at", hgu133aSYMBOL)
[1] "H3F3A" ######################################################## ###FOR R-2.10.0: ########################################################
> sessionInfo()
R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] hgu133a.db_2.3.5 org.Hs.eg.db_2.3.6 [3] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6 [5] RSQLite_0.7-3 DBI_0.2-4 [7] AnnotationDbi_1.8.0 Biobase_2.6.0
> get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "A_52_P71146" not found
> get("201265_at", hgu133aSYMBOL)
[1] NA
> get("200080_s_at", hgu133aSYMBOL)
[1] NA -- Ulisse: "Considerate la vostra semenza: fatti non foste a viver come bruti, ma per seguir virtute e canoscenza". (Dante, Divina Commedia, Canto XXVI) -- G-C T---A Luigi Marchionni, M.D., Ph.D. C----G The Sidney Kimmel Comprehensive Cancer Center G-------C Johns Hopkins University - School of Medicine A------T 1550 Orleans St., CRB2, Rm 554 C----G Baltimore, MD, 21231, USA G--C Tel: (001) 410-502-8179 C-G Fax: (001) 410-502-5742 T---A e-mail: marchion at jhmi.edu G-----C URL: http://astor.som.jhmi.edu/~marchion/ A-------T
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Thanks. I'll try to see whether if I build my mgug4122a.db package this happens. maybe the problem is in org.Mm.eg.db luigi
On Nov 4, 2009, at 12:54 AM, Seth Falcon wrote:
Hi Luigi, On 11/3/09 8:37 PM, Luigi Marchionni wrote:
here is Luigi. I am developing a package which heavily relies on AnnotationDbi, annotate, and metadatata packages from bioconductor. I noticed I weird behavior of the get() function for the mgug4122a.db() package moving from 2.9.1 to later versions (2.10.0 and 2.11.0 the one I am running). Basically, an ID pointing to an 'NA' is not properly handled in the newer versions, see below. The same ID in one case returns 'NA', in the other case returns an error. If also checked if this happened with other package (hgu133a.db) and the behavior is different, for both an ID that was there and now is missing, and an ID that was not there, and now is not there as well.... I give you the details below. Do you have any idea of what is going on?
Partial details... When you call get() or otherwise retrieve a value from an annotation package object using a key, like a probe ID, there are three situations: 1. The probe ID is valid and maps to a value in the given object. 2. The probe ID is valid, but does not map to a value so NA is returned. 3. The probe ID is not valid, an error is raised. So using a part of your example I see:
get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "A_52_P71146" not found
"A_52_P71146" %in% keys(mgug4122aSYMBOL)
[1] FALSE I'll let someone else take a crack at why the key sets would have changed. + seth
sessionInfo()
R version 2.11.0 Under development (unstable) (--) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6 RSQLite_0.7-3 [4] DBI_0.2-4 AnnotationDbi_1.9.0 Biobase_2.5.8 loaded via a namespace (and not attached): [1] tools_2.11.0
Luigi ######################################################## ###FOR R-2.9.1: ########################################################
sessionInfo()
R version 2.9.1 (2009-06-26) i386-apple-darwin9.7.0 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] hgu133a.db_2.2.12 mgug4122a.db_2.2.11 [3] RSQLite_0.7-3 DBI_0.2-4 [5] AnnotationDbi_1.6.1 Biobase_2.4.1
get("A_52_P71146", mgug4122aSYMBOL)
[1] NA
get("201265_at", hgu133aSYMBOL)
[1] NA
get("200080_s_at", hgu133aSYMBOL)
[1] "H3F3A" ######################################################## ###FOR R-2.10.0: ########################################################
sessionInfo()
R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] hgu133a.db_2.3.5 org.Hs.eg.db_2.3.6 [3] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6 [5] RSQLite_0.7-3 DBI_0.2-4 [7] AnnotationDbi_1.8.0 Biobase_2.6.0
get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "A_52_P71146" not found
get("201265_at", hgu133aSYMBOL)
[1] NA
get("200080_s_at", hgu133aSYMBOL)
[1] NA -- Ulisse: "Considerate la vostra semenza: fatti non foste a viver come bruti, ma per seguir virtute e canoscenza". (Dante, Divina Commedia, Canto XXVI) -- G-C T---A Luigi Marchionni, M.D., Ph.D. C----G The Sidney Kimmel Comprehensive Cancer Center G-------C Johns Hopkins University - School of Medicine A------T 1550 Orleans St., CRB2, Rm 554 C----G Baltimore, MD, 21231, USA G--C Tel: (001) 410-502-8179 C-G Fax: (001) 410-502-5742 T---A e-mail: marchion at jhmi.edu G-----C URL: http://astor.som.jhmi.edu/~marchion/ A-------T
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Maybe you all guys already discussed that, but the stringsAsFactors=TRUE as global setting for R in genomics can be a problem. Clearly this is not the case for the problem I raised (or maybe it is,,, in my .Rprofile I set it to FALSE..., il check this), however, for annotation purposes factors (and cbind, data.frame, and the like) do not work. I ask you to make a major philosophical decision here... Either everybody set the option to FALSE in the code, orx the base setting change. or. I do not know. Or maybe there is a shortcut I am not aware of. You cannot really anticipate what the end-user will do... but if stringsAsFactors=TRUE is the scenario, one must take to keep it in mind. Am I making too complicate? Is that irrelevant? Luigi PS:abusing of your patience, I know
On Nov 4, 2009, at 12:54 AM, Seth Falcon wrote:
Hi Luigi, On 11/3/09 8:37 PM, Luigi Marchionni wrote:
here is Luigi. I am developing a package which heavily relies on AnnotationDbi, annotate, and metadatata packages from bioconductor. I noticed I weird behavior of the get() function for the mgug4122a.db() package moving from 2.9.1 to later versions (2.10.0 and 2.11.0 the one I am running). Basically, an ID pointing to an 'NA' is not properly handled in the newer versions, see below. The same ID in one case returns 'NA', in the other case returns an error. If also checked if this happened with other package (hgu133a.db) and the behavior is different, for both an ID that was there and now is missing, and an ID that was not there, and now is not there as well.... I give you the details below. Do you have any idea of what is going on?
Partial details... When you call get() or otherwise retrieve a value from an annotation package object using a key, like a probe ID, there are three situations: 1. The probe ID is valid and maps to a value in the given object. 2. The probe ID is valid, but does not map to a value so NA is returned. 3. The probe ID is not valid, an error is raised. So using a part of your example I see:
get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "A_52_P71146" not found
"A_52_P71146" %in% keys(mgug4122aSYMBOL)
[1] FALSE I'll let someone else take a crack at why the key sets would have changed. + seth
sessionInfo()
R version 2.11.0 Under development (unstable) (--) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6 RSQLite_0.7-3 [4] DBI_0.2-4 AnnotationDbi_1.9.0 Biobase_2.5.8 loaded via a namespace (and not attached): [1] tools_2.11.0
Luigi ######################################################## ###FOR R-2.9.1: ########################################################
sessionInfo()
R version 2.9.1 (2009-06-26) i386-apple-darwin9.7.0 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] hgu133a.db_2.2.12 mgug4122a.db_2.2.11 [3] RSQLite_0.7-3 DBI_0.2-4 [5] AnnotationDbi_1.6.1 Biobase_2.4.1
get("A_52_P71146", mgug4122aSYMBOL)
[1] NA
get("201265_at", hgu133aSYMBOL)
[1] NA
get("200080_s_at", hgu133aSYMBOL)
[1] "H3F3A" ######################################################## ###FOR R-2.10.0: ########################################################
sessionInfo()
R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] hgu133a.db_2.3.5 org.Hs.eg.db_2.3.6 [3] mgug4122a.db_2.3.5 org.Mm.eg.db_2.3.6 [5] RSQLite_0.7-3 DBI_0.2-4 [7] AnnotationDbi_1.8.0 Biobase_2.6.0
get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "A_52_P71146" not found
get("201265_at", hgu133aSYMBOL)
[1] NA
get("200080_s_at", hgu133aSYMBOL)
[1] NA -- Ulisse: "Considerate la vostra semenza: fatti non foste a viver come bruti, ma per seguir virtute e canoscenza". (Dante, Divina Commedia, Canto XXVI) -- G-C T---A Luigi Marchionni, M.D., Ph.D. C----G The Sidney Kimmel Comprehensive Cancer Center G-------C Johns Hopkins University - School of Medicine A------T 1550 Orleans St., CRB2, Rm 554 C----G Baltimore, MD, 21231, USA G--C Tel: (001) 410-502-8179 C-G Fax: (001) 410-502-5742 T---A e-mail: marchion at jhmi.edu G-----C URL: http://astor.som.jhmi.edu/~marchion/ A-------T
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
hi Seth and rest of the list, never thought about this till i saw your email and in particular this clarification:
When you call get() or otherwise retrieve a value from an annotation package object using a key, like a probe ID, there are three situations: 1. The probe ID is valid and maps to a value in the given object. 2. The probe ID is valid, but does not map to a value so NA is returned. 3. The probe ID is not valid, an error is raised.
i've built an annotation package for a custom array to which i've added a few new sql tables to provide additional mappings to various non-standard annotations on the probes (following section 3 -how to add extra data into your packages- from vignette SQLForge of AnnotationDbi). the way in which i add these data is by creating a flat file with the "records" and importing them into the SQL database of the package through the unix shell with sqlite3 dbName << EOF import newdata.txt newtable .exit EOF where dbName should be the .sqlite file created by popXXXCHIPDB(), newdata.txt is the flat file with the data of this new mapping and newtable is the SQL table i've specifically created on the .sqlite file to support the mapping in my annotation package. however, in this way i don't know how to implement the second situation you describe. i tried to associate NA's to valid keys having lines .. whateverkey|NA .. in the flat file that is imported later but then this NA is not interpreted as an NA value but as a string "NA". then i concluded that the way i had to do it was to remove those lines and having the user to specify the parametere ifnotfound=NA in their get/mget commands. so now my question would be (either for you or for whoever in the list knows about this). how do i introduce a new mapping into my annotation package such that a key is valid but it does not map to a value so that NA is returned? thanks! robert.
On 11/3/09 11:07 PM, Luigi Marchionni wrote:
Maybe you all guys already discussed that, but the stringsAsFactors=TRUE as global setting for R in genomics can be a problem.
Yes, in general global configuration that changes the behavior of many R functions poses a potential problem. I would generally recommend not modifying such global options. The Bioconductor project, for example, tests all software using default options (at least as far as I know).
Clearly this is not the case for the problem I raised (or maybe it is,,, in my .Rprofile I set it to FALSE..., il check this), however, for annotation purposes factors (and cbind, data.frame, and the like) do not work. I ask you to make a major philosophical decision here... Either everybody set the option to FALSE in the code, orx the base setting change. or. I do not know. Or maybe there is a shortcut I am not aware of. You cannot really anticipate what the end-user will do... but if stringsAsFactors=TRUE is the scenario, one must take to keep it in mind. Am I making too complicate?
You have not described very clearly a specific problem you are having so it is difficult to provide any further assistance. + seth
Hi Luigi and Robert, For Luigi: The inconsistency appears to be originating from missing probes in the mgug4122a probes table. This only seems to be happening for certain "corner case" probe packages. But in the meantime, this is not a problem with get() or with the org packages. I will get to the bottom of what caused this and apply a fix it ASAP wherever it is a problem (so far very few things seem to be affected). Thank you for pointing this out! And to answer Roberts question: To get an NA back (in R) from one of these accessory tables you are adding, you should only have to have null values in the relevant fields after the import. No need to put NA strings into your input files as that will result in NA strings being stored in the DB. Just leaving those portions of the input table blank should result in null values in your database table, which should give you the results you want when you look at those from R using AnnotationDbi (meaning NAs). So basically your database table should look like this when you query it: whateverkey| Please let me know if I need to clarify that. Marc
Robert Castelo wrote:
hi Seth and rest of the list, never thought about this till i saw your email and in particular this clarification:
When you call get() or otherwise retrieve a value from an annotation
package object using a key, like a probe ID, there are three situations:
1. The probe ID is valid and maps to a value in the given object.
2. The probe ID is valid, but does not map to a value so NA is returned.
3. The probe ID is not valid, an error is raised.
i've built an annotation package for a custom array to which i've added a few new sql tables to provide additional mappings to various non-standard annotations on the probes (following section 3 -how to add extra data into your packages- from vignette SQLForge of AnnotationDbi). the way in which i add these data is by creating a flat file with the "records" and importing them into the SQL database of the package through the unix shell with sqlite3 dbName << EOF import newdata.txt newtable .exit EOF where dbName should be the .sqlite file created by popXXXCHIPDB(), newdata.txt is the flat file with the data of this new mapping and newtable is the SQL table i've specifically created on the .sqlite file to support the mapping in my annotation package. however, in this way i don't know how to implement the second situation you describe. i tried to associate NA's to valid keys having lines .. whateverkey|NA .. in the flat file that is imported later but then this NA is not interpreted as an NA value but as a string "NA". then i concluded that the way i had to do it was to remove those lines and having the user to specify the parametere ifnotfound=NA in their get/mget commands. so now my question would be (either for you or for whoever in the list knows about this). how do i introduce a new mapping into my annotation package such that a key is valid but it does not map to a value so that NA is returned? thanks! robert.
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel