[Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg38&position=brca1&hgt.positionInput=brca1&hgt.suggestTrack=knownGene&Submit=submit&hgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAz&pix=1885 If instead of "genes" you do "transcripts", you will see 20 different transcripts for this gene, including the one listed by NCBI. I havent tried it yet (haven't upgraded R or bioconductor to latest version), but there is now an Ensembl based annotation package as well, that may work better?? http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html -Robert On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger <
Ludwig.Geistlinger at bio.ifi.lmu.de> wrote:
Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG00000012048; entrez:672 via
genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672"))
gives me:
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
672 chr17 [43044295, 43170403] - | 672
-------
seqinfo: 455 sequences (1 circular) from hg38 genome
However, querying Ensembl and NCBI Gene
http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000012048
http://www.ncbi.nlm.nih.gov/gene/672
the gene is located at (note the difference in the end position)
Chromosome 17: 43,044,295-43,125,483 reverse strand
How is the inconsistency explained and how to extract an ENSEMBL/NCBI
conform annotation from the TxDb object?
(I am aware of biomaRt, but I want to explicitely use the Bioc annotation
functionality).
Thanks!
Ludwig
--
Dipl.-Bioinf. Ludwig Geistlinger
Lehr- und Forschungseinheit f?r Bioinformatik
Institut f?r Informatik
Ludwig-Maximilians-Universit?t M?nchen
Amalienstrasse 17, 2. Stock, B?ro A201
80333 M?nchen
Tel.: 089-2180-4067
eMail: Ludwig.Geistlinger at bio.ifi.lmu.de
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel