Message-ID: <dd0f1459113444eec4d30b6ff9f04411.squirrel@imap.ifi.lmu.de>
Date: 2015-06-03T11:03:33Z
From: Ludwig Geistlinger
Subject: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
Dear Bioc annotation team,
Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for
BRCA1; ENSG00000012048; entrez:672
via
> genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672"))
gives me:
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
672 chr17 [43044295, 43170403] - | 672
-------
seqinfo: 455 sequences (1 circular) from hg38 genome
However, querying Ensembl and NCBI Gene
http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000012048
http://www.ncbi.nlm.nih.gov/gene/672
the gene is located at (note the difference in the end position)
Chromosome 17: 43,044,295-43,125,483 reverse strand
How is the inconsistency explained and how to extract an ENSEMBL/NCBI
conform annotation from the TxDb object?
(I am aware of biomaRt, but I want to explicitely use the Bioc annotation
functionality).
Thanks!
Ludwig
--
Dipl.-Bioinf. Ludwig Geistlinger
Lehr- und Forschungseinheit f?r Bioinformatik
Institut f?r Informatik
Ludwig-Maximilians-Universit?t M?nchen
Amalienstrasse 17, 2. Stock, B?ro A201
80333 M?nchen
Tel.: 089-2180-4067
eMail: Ludwig.Geistlinger at bio.ifi.lmu.de