dear Robert and Ludwig,
the EnsDb packages provide all the gene/transcript etc annotations for all
genes defined in the Ensembl database (for a given species and Ensembl
release). Except the column/attribute "entrezid" that is stored in the
internal database there is however no link to NCBI or UCSC annotations.
So, basically, if you want to use "pure" Ensembl based annotations: use
EnsDb, if you want to have the UCSC annotations: use the TxDb packages.
In case you need EnsDbs of other species or Ensembl versions, the
ensembldb package provides functionality to generate such packages either
using the Ensembl Perl API or using GTF files provided by Ensembl. If you
have problems building the packages, just drop me a line and I'll do
that.
cheers, jo
On 03 Jun 2015, at 15:56, Robert M. Flight <rflight79 at gmail.com> wrote:
Ludwig,
If you do this search on the UCSC genome browser (which this annotation
package is built from), you will see that the longest variant is what
is
shown
http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg38&position=brca1&hgt.positionInput=brca1&hgt.suggestTrack=knownGene&Submit=submit&hgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAz&pix=1885
If instead of "genes" you do "transcripts", you will see 20 different
transcripts for this gene, including the one listed by NCBI.
I havent tried it yet (haven't upgraded R or bioconductor to latest
version), but there is now an Ensembl based annotation package as well,
that may work better??
http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html
-Robert
On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger <
Ludwig.Geistlinger at bio.ifi.lmu.de> wrote:
Dear Bioc annotation team,
Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g.
for
BRCA1; ENSG00000012048; entrez:672
via
genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672"))
gives me:
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
672 chr17 [43044295, 43170403] - | 672
-------
seqinfo: 455 sequences (1 circular) from hg38 genome
However, querying Ensembl and NCBI Gene
http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000012048
http://www.ncbi.nlm.nih.gov/gene/672
the gene is located at (note the difference in the end position)
Chromosome 17: 43,044,295-43,125,483 reverse strand
How is the inconsistency explained and how to extract an ENSEMBL/NCBI
conform annotation from the TxDb object?
(I am aware of biomaRt, but I want to explicitely use the Bioc
annotation
functionality).
Thanks!
Ludwig
--
Dipl.-Bioinf. Ludwig Geistlinger
Lehr- und Forschungseinheit f??r Bioinformatik
Institut f??r Informatik
Ludwig-Maximilians-Universit??t M??nchen
Amalienstrasse 17, 2. Stock, B??ro A201
80333 M??nchen
Tel.: 089-2180-4067
eMail: Ludwig.Geistlinger at bio.ifi.lmu.de