Skip to content

[Bioc-devel] mapping vector of gene ids to gene symbols

5 messages · Michael Lawrence, Robert Castelo, Marc Carlson

#
hi Michael,

this souns like if you had a list of variants where you have annotated 
their Entrez Gene IDs, which sometimes are NA because those variance do 
not overlap any gene and sometimes are repeated Entrez Gene IDs when two 
or more of those variants overlap the same gene :)

at least is the situation i had when programming the VariantFiltering 
package, i also could not find a one-liner solution but you might want 
to look to what i ended up doing there, in case it might be also useful 
for you.

you'll find it in the method "annotateVariants" that dispatches "OrgDb" 
objects (i.e., gene-centric annotation packages), within 
VariantFiltering/R/annotationEngine.R

if you take a look at it, do not hesitate to comment if you have any 
suggestion to improve this. i also look forward to the annotation-gurus 
feedback on this question :)

cheers,

robert.
On 06/18/2014 03:03 PM, Michael Lawrence wrote:

  
    
#
hi, thanks of the compliments to the package, i'm happy to hear you 
liked it! i must acknowledge that part of the design of the package is 
the result of conversations i had with Martin, Marc and specially 
Valerie during the review process.

i only got to know about VRanges once the package was nearly finished in 
its current form and it is in my mind to try to adapt it to that data 
structure for the reasons you comment. i haven't explored 
'ReportingTools' but i'll give a look at it.

i'll be also very glad if you use it for teaching at Brixen but
beware that this is the first version that entered the release, so it 
may have bugs that haven't been discovered yet (a list of known 
shortcomings is at the end of the vignette). do not hesitate to report 
any problem or feature request that may help using it during the course 
and i'll try to fix or add it asap. i'm coming to BioC in Boston, we can 
discuss there further directions for a better integration of 
VariantFiltering with the rest of the BioC infrastructure.


cheers,
robert.
On 06/18/2014 03:43 PM, Michael Lawrence wrote:

  
    
#
Hi Michael,

The fact that duplicate keys are not being allowed in OrganismDb objects 
is an inconsistency that I will be looking into.  That is, if you do the 
same thing with orgDb or TranscriptDb objects, duplicates are allowed 
and passed along.  So I plan to fix this. I have checked in a 
preliminary fix to the devel branch.

The NAs on the other hand though have always been dropped from select 
methods since it doesn't usually make sense to include them in the 
output.  I can explore the possibility of doing something with an extra 
argument to support a much more strict interpretation of the keys, but 
it would have to error out if there are one to many mappings.


  Marc
On 06/18/2014 06:03 AM, Michael Lawrence wrote: