Skip to content

[Bioc-devel] Different behavior of get() between 2.9.0 and later versions`

7 messages · Luigi Marchionni, Robert Castelo, Seth Falcon +1 more

#
Hi guys,
here is Luigi. I am developing a package which heavily relies on  
AnnotationDbi, annotate, and metadatata packages from bioconductor.
I noticed I weird behavior of the get() function for the  
mgug4122a.db() package moving from 2.9.1 to later versions (2.10.0 and  
2.11.0 the one I am running).
Basically, an ID pointing to an 'NA' is not properly handled in the  
newer versions, see below.
The same ID in one case returns 'NA', in the other case returns an  
error.
If also checked if this happened with other package (hgu133a.db) and  
the behavior is different, for both an ID that was there and now is  
missing, and an ID that was not there, and now is not there as well....
I give you the details below. Do you have any idea of what is going on?

Luigi

########################################################
###FOR R-2.9.1:
########################################################
 > sessionInfo()
R version 2.9.1 (2009-06-26)
i386-apple-darwin9.7.0

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets
[6] methods   base

other attached packages:
[1] hgu133a.db_2.2.12   mgug4122a.db_2.2.11
[3] RSQLite_0.7-3       DBI_0.2-4
[5] AnnotationDbi_1.6.1 Biobase_2.4.1

 > get("A_52_P71146", mgug4122aSYMBOL)
[1] NA

 >  get("201265_at", hgu133aSYMBOL)
[1] NA

 > get("200080_s_at", hgu133aSYMBOL)
[1] "H3F3A"

########################################################
###FOR R-2.10.0:
########################################################
 > sessionInfo()
R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets
[6] methods   base

other attached packages:
[1] hgu133a.db_2.3.5    org.Hs.eg.db_2.3.6
[3] mgug4122a.db_2.3.5  org.Mm.eg.db_2.3.6
[5] RSQLite_0.7-3       DBI_0.2-4
[7] AnnotationDbi_1.8.0 Biobase_2.6.0

 > get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
   value for "A_52_P71146" not found

 > get("201265_at", hgu133aSYMBOL)
[1] NA

 > get("200080_s_at", hgu133aSYMBOL)
[1] NA



--
Ulisse: "Considerate la vostra semenza:
fatti non foste a viver come bruti,
ma per seguir virtute e canoscenza".
(Dante, Divina Commedia, Canto XXVI)
--
      G-C
     T---A    Luigi Marchionni, M.D., Ph.D.
    C----G    The Sidney Kimmel Comprehensive Cancer Center
G-------C    Johns Hopkins University -  School of Medicine
   A------T     1550 Orleans St., CRB2, Rm 554
     C----G    Baltimore, MD, 21231, USA
     G--C    Tel: (001) 410-502-8179
      C-G    Fax: (001) 410-502-5742
     T---A    e-mail: marchion at jhmi.edu
    G-----C    URL: http://astor.som.jhmi.edu/~marchion/
   A-------T
#
Hi Luigi,
On 11/3/09 8:37 PM, Luigi Marchionni wrote:
Partial details...

When you call get() or otherwise retrieve a value from an annotation 
package object using a key, like a probe ID, there are three situations:

1. The probe ID is valid and maps to a value in the given object.
2. The probe ID is valid, but does not map to a value so NA is returned.
3. The probe ID is not valid, an error is raised.

So using a part of your example I see:

 > get("A_52_P71146", mgug4122aSYMBOL)
Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
   value for "A_52_P71146" not found
 > "A_52_P71146" %in% keys(mgug4122aSYMBOL)
[1] FALSE

I'll let someone else take a crack at why the key sets would have changed.

+ seth

 > sessionInfo()
R version 2.11.0 Under development (unstable) (--)
x86_64-unknown-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] mgug4122a.db_2.3.5  org.Mm.eg.db_2.3.6  RSQLite_0.7-3
[4] DBI_0.2-4           AnnotationDbi_1.9.0 Biobase_2.5.8

loaded via a namespace (and not attached):
[1] tools_2.11.0
#
Thanks.
I'll try to see whether if I build my mgug4122a.db package this happens.
maybe the problem is in org.Mm.eg.db
luigi
On Nov 4, 2009, at 12:54 AM, Seth Falcon wrote:

            
#
Maybe you all guys already discussed that, but the  
stringsAsFactors=TRUE as global setting for R in genomics can be a  
problem.
Clearly this is not the case for the problem I raised (or maybe it  
is,,, in my .Rprofile I set it to FALSE..., il check this), however,  
for annotation purposes factors (and cbind, data.frame, and the like)  
do not work.
I ask you to make a major philosophical decision here...
Either everybody set the option to FALSE in the code, orx the base  
setting change. or. I do not know.
Or maybe there is a shortcut I am not aware of.
You cannot really anticipate what the end-user will do...
but if  stringsAsFactors=TRUE is the scenario, one must take to keep  
it in mind.
Am I making too complicate?
Is that irrelevant?
Luigi
PS:abusing of your patience, I know
On Nov 4, 2009, at 12:54 AM, Seth Falcon wrote:

            
#
hi Seth and rest of the list,

never thought about this till i saw your email and in particular this
clarification:
i've built an annotation package for a custom array to which i've added
a few new sql tables to provide additional mappings to various
non-standard annotations on the probes (following section 3 -how to add
extra data into your packages- from vignette SQLForge of AnnotationDbi).
the way in which i add these data is by creating a flat file with the
"records" and importing them into the SQL database of the package
through the unix shell with

sqlite3 dbName << EOF
import newdata.txt newtable
.exit
EOF

where dbName should be the .sqlite file created by popXXXCHIPDB(),
newdata.txt is the flat file with the data of this new mapping and
newtable is the SQL table i've specifically created on the .sqlite file
to support the mapping in my annotation package.

however, in this way i don't know how to implement the second situation
you describe. i tried to associate NA's to valid keys having lines

..
whateverkey|NA
..

in the flat file that is imported later but then this NA is not
interpreted as an NA value but as a string "NA". then i concluded that
the way i had to do it was to remove those lines and having the user to
specify the parametere ifnotfound=NA in their get/mget commands.


so now my question would be (either for you or for whoever in the list
knows about this). how do i introduce a new mapping into my annotation
package such that a key is valid but it does not map to a value so that
NA is returned?


thanks!
robert.
#
On 11/3/09 11:07 PM, Luigi Marchionni wrote:
Yes, in general global configuration that changes the behavior of many R 
functions poses a potential problem.  I would generally recommend not 
modifying such global options.  The Bioconductor project, for example, 
tests all software using default options (at least as far as I know).
You have not described very clearly a specific problem you are having so 
it is difficult to provide any further assistance.

+ seth
#
Hi Luigi and Robert,

For Luigi:  The inconsistency appears to be originating from missing
probes in the mgug4122a probes table.  This only seems to be happening
for certain "corner case" probe packages.  But in the meantime, this is
not a problem with get() or with the org packages.  I will get to the
bottom of what caused this and apply a fix it ASAP wherever it is a
problem (so far very few things seem to be affected).  Thank you for
pointing this out!

And to answer Roberts question:  To get an NA back (in R) from one of
these accessory tables you are adding, you should only have to have null
values in the relevant fields after the import.  No need to put NA
strings into your input files as that will result in NA strings being
stored in the DB.  Just leaving those portions of the input table blank
should result in null values in your database table, which should give
you the results you want when you look at those from R using
AnnotationDbi (meaning NAs). 

So basically your database table should look like this when you query it:

whateverkey|


Please let me know if I need to clarify that.


  Marc
Robert Castelo wrote: