Skip to content
Prev 8357 / 21307 Next

[Bioc-devel] Behavior of select function in AnnotationDbi

I would rather mapIds() continue to operate on a single column, return a named character vector, and by default provide a 1:1 relationship between input and output. multiVals=CharacterList does actually return a 1:many mapping in a way that retains parallel structure (I guess, maybe module limitations noted below and others).

Personally, I do not favor the message() associated with select(); select() is behaving as documented. A warning seems unnecessary -- "warning! I'm doing what I'm supposed to do!". If there is a message of some sort, I'd rather it was consistently presenting information. I also like message() _because_ it's presented when the issue arises, rather than out of context, maybe for the same reasons I find top-posting in email responses [sic] so irritating. If the documented behavior of select() is fundamentally unsatisfactory, then yes we should change the documented behavior rather than emitting warnings.

select() could be updated to accept the equivalent of the multiVals argument. select() could also be updated to always return a DataFrame or to return a DataFrame when multiVals is specified, though one does like a function to return a consistent data type. The original choice to use data.frame was from informal observation that the classes enabled by DataFrame (e.g., CharacterList) pose problems for less-experienced [by this I mean a broad swath of Bioc] users, and the annotation resources should be as accessible as possible.

I see no benefit in NOT ordering the return values in the same order as the input keys. Likewise, I see no value in dropping support for duplicate or NA keys.

I view both of the following treatments of NA in mapIds() as bugs; they should return named character vectors mapping <NA> to NA.
'select()' returned 1:1 mapping between keys and columns
$BRCA1
[1] "672"

$<NA>
NULL
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.

Reporting mapping when mapIds() is invoked, with or without the multiVals= argument, also seems unnecessary.
'select()' returned 1:many mapping between keys and columns
       BRCA1 
"GO:0000151" 

Martin