________________________________________
From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Rainer Johannes [Johannes.Rainer at eurac.edu]
Sent: Saturday, January 09, 2016 11:01 AM
To: Herv? Pag?s
Cc: Michael Lawrence; Martin Morgan
Subject: Re: [Bioc-devel] Problem with seqnames of TwoBitFile from AnnotationHub
Yes, using BSGenome would help in this case.
In the long run I think it might be important to have this fixed, not necessarily for human, but for other species/genome builds for which there might not be an BSGenome package available; through AnnotationHub all GTF files and fasta files would be available. Note also that the FaFiles from Ensembl do have the ?correct? chromosome names although I assume they were built from the same Ensembl fasta files than the TwoBitFiles.
jo
On 08 Jan 2016, at 22:49, Herv? Pag?s <hpages at fredhutch.org> wrote:
On 01/08/2016 01:09 PM, Michael Lawrence wrote:
That is one solution. But everyone using that genome would need to
reset the seqlevels to the "standard" ones. In this specific case, is
there any reason not to just use the BSgenome for GRCh38?
I agree. Maybe we don't need seqlevels<-,TwoBitFile for that particular
use case. Just wanted to mention that the ability to rename the
sequences in a TwoBitFile, FastaFile, or other file-based object that
supports seqinfo() would be useful in general.
H.
On Fri, Jan 8, 2016 at 11:04 AM, Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi Jo, Michael,
What about implementing a seqlevels() setter for TwoBitFile objects? All
you need for this is an extra slot for storing the user-supplied
seqlevels. Note that in general the seqlevels() setter allows more than
renaming the seqlevels. It also allows dropping, adding, and shuffling
them. But you don't need to support all that. Supporting renaming would
already go a long way. See selectMethod("seqlevels<-", "TxDb") in
GenomicFeatures for an example of a restricted "seqlevels<-" method.
H.
On 01/08/2016 09:50 AM, Rainer Johannes wrote:
I agree, I would not modify the file content. At present it is however not
possible to use e.g. getSeq on these TwoBitFiles, since the chromosome names
in the submitted GRanges (e.g. 1) do not match the seqnames/seqinfo of the
TwoBitFile. I don?t know if a seqnames or seqinfo method stripping of all
but the first name-part would help here...
jo
On 08 Jan 2016, at 15:18, Sean Davis <seandavi at gmail.com> wrote:
I will make the small editorial comment to guard against modifying file
content on transit into the hub object. On the client side (after getting
such an object) I think a ?fix? would be to have a quick seqnames method to
strip off all but the first whitespace delimited piece.
Sean
On Jan 8, 2016, at 8:40 AM, Michael Lawrence <lawrence.michael at gene.com>
wrote:
This is perhaps something that could be handled when population the
hub, but I'm not sure how rtracklayer could automatically derive the
chromosome names.
On Fri, Jan 8, 2016 at 2:37 AM, Rainer Johannes
<Johannes.Rainer at eurac.edu> wrote:
dear all,
I just run into a problem with a TwoBitFile I fetched from
AnnotationHub. I was fetching a TwoBitFile with the genomic DNA sequence, as
provided by Ensembl:
library(AnnotationHub)
ah <- AnnotationHub()
tbf <- ah[["AH50068?]]
head(seqnames(seqinfo(tbf)))
[1] "1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF"
[2] "10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF"
[3] "11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF"
[4] "12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF"
[5] "13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF"
[6] "14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF"
Would be nice, if the seqnames would be really just the chromsome names
and not the whole string from the FA file header. Is there a way I could fix
the file myself or is this something that should be fixed in the rtracklayer
or AnnotationHub package when the TwoBitFile is created?
thanks, jo
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.