[Bioc-devel] naming of TxDb packages
Hi Michael,
On 11-11-03 06:36 PM, Michael Lawrence wrote:
We're actually using a patched version of makeTranscriptDbFromBiomart to get models out of an internal biomart. Patch is on its way to Marc. So it would be like: TxDb.Hsapiens.BioMart.hg19.gneGenes?
This suggests that you have a Mart called "hg19" (see below why).
Seems weird to mix the technical mode of data retrieval into the name.
The naming scheme when 'Data source' is "BioMart" seems to be a little
bit different. For example, if I use makeTranscriptDbFromBiomart() with
biomart="ensembl" and dataset="hsapiens_gene_ensembl", then I get:
> GenomicFeatures:::.makePackageName(txdb)
[1] "TxDb.Hsapiens.BioMart.ensembl.GRCh37.p5"
Token #4 ("ensembl") is the name of the Mart. I'm a little bit
surprised with token #5 though. I would have expected it to be
the ensembl version (eventually followed by the reference genome)
because one can always infer the reference genome from the ensembl
version but not the other way around. In other words, if Ensembl
makes 2 or more releases based on the same reference genome, our
current naming scheme won't differentiate the 2 TxDb packages.
Wouldn't it be better if we had something like:
TxDb.Hsapiens.BioMart.ensembl.63
TxDb.Hsapiens.BioMart.ensembl.64
Anyway, back to your problem. Yes in your case the technical mode
doesn't really matter so it's really up to you. Maybe being explicit
about the reference genome (with *.UCSC.hg19.*) is more important
than the technical mode?
H.
Michael
2011/11/3 Herv? Pag?s <hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
Hi Michael,
On 11-11-02 08:58 PM, Michael Lawrence wrote:
What are the precise meanings of the tokens in the TxDb package
names. In
particular, is "UCSC" the genome provider or the annotation
provider? In
the official packages, those are one in the same, but if someone
wanted to
make a package for custom annotations on a UCSC genome?
The pkg name is generated automatically by internal helper function
GenomicFeatures:::.__makePackageName(). This function extracts all the
tokens from the txdb's metadata table. It looks like the 3rd token
in the pkg name is extracted from the 'Data source' field and can only
be "UCSC" or "BioMart", typically indicating whether the txdb was made
with makeTranscriptDbFromUCSC() or makeTranscriptDbFromBiomart().
The first function downloads annotations from the UCSC genome
browser using rtracklayer. The 2nd one downloads them with biomaRt
from whatever mart/dataset was specified.
For your custom annotations, the final name of the pkg will depend on
what GenomicFeatures:::.__makePackageName() finds in the metadata
table of your txdb, but, if 'Data source' is not "UCSC" or "BioMart",
it seems that GenomicFeatures:::.__makePackageName() will fail (and not
in a very informative way I'm afraid). If I understand correctly, you
are making your custom txdb object with a call to makeTranscriptDb()?
If that's the case, make sure you provide enough information
thru its 'metadata' argument. Maybe you could set 'Data source' to
"UCSC" and use some kind of custom name for the table (which in your
case is probably not a real UCSC "table"). This custom name will become
the last token in the package name. So you would end up with something
like:
TxDb.Hsapiens.UCSC.hg19.__GenentechGenes
This solution would have the advantage of having
GenomicFeatures:::.__makePackageName() work out-of-the-box.
But maybe it's confusing because it suggests that
the txdb was made with makeTranscriptDbFromUCSC()? I hope
it's not.
H.
Thanks,
Michael
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319