[Bioc-devel] export() to 2bit file
On 05/12/2014 12:23 PM, Michael Lawrence wrote:
On Mon, May 12, 2014 at 11:41 AM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi Michael,
On 05/09/2014 04:39 PM, Michael Lawrence wrote:
What would be the fastest way to do this with a DNAString? Just an
alphabetFrequency?
That would do it.
A couple of other issues I ran into with the 2bit code:
(1) It fails on empty sequences:
> export(DNAStringSet(c("AA", "", "CC")), "ww.2bit")
Warning message:
In (function (object, seqname) :
needLargeMem: trying to allocate 0 bytes (limit: 17179869184
<tel:17179869184>)
Error in sapply(object, function(x) typeof(x) == "externalptr"
&& is(x, :
error in evaluating the argument 'X' in selecting a method for
function 'sapply': Error in (function (object, seqname) : UCSC
library operation failed
Thanks for catching this one.
(2) Could be that internal helper rtracklayer:::.DNAString_to___twoBit()
is introducing a memory leak as it doesn't seem that the memory
the returned external pointer is pointing to (a struct twoBit) is
ever released. The memory leak is minor if the sequence passed via
'object' has no masks but can be important if there are masks and
if the masks are made of hundreds of thousands of ranges.
Right now it is the responsibility of the caller to free that memory.
Probably should have used a finalizer on the externalptr, but the way it
works now is that the write function frees the object. So it's not
leaking (as far as I know), but the design could be improved.
I see. So we're probably OK as long as the loop containing the calls to .DNAString_to_twoBit() is successful and nothing goes wrong after that (e.g. no user interrupt). Thanks, H.
Thanks,
H.
On Fri, May 9, 2014 at 4:07 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
Hi Michael,
library(rtracklayer)
library(Biostrings)
x <- DNAStringSet("AAA-CCC-GGG-TTT-____NNN-KKK")
Then:
> x
A DNAStringSet instance of length 1
width seq
[1] 23 AAA-CCC-GGG-TTT-NNN-KKK
> export(x, "x.2bit")
> import("x.2bit")
A DNAStringSet instance of length 1
width seq
names
[1] 23 AAATCCCTGGGTTTTTNNNTTTT
1
What about having the "export" method for TwoBitFile raise
an error
(or at least issue a warning) instead of silently turning
everything
that is not A, C, G, T, or N into a T?
Thanks,
H.
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319