[Bioc-devel] GIntervalTree objects are corrupted during save/load
On 07/01/2014 10:38 AM, Michael Lawrence wrote:
The difference of course being that you implemented those trees from scratch, while we're relying on the Kent library for the low-level management of the tree. We would probably need to break from the Kent library to pursue this approach.
I see. That makes things a little bit more complicated. I wonder if the whole effort is worth it given that serialization of a GIntervalTree doesn't seem like a common use case and that re-processing the GIntervalTree from the GRanges object maybe doesn't take that much time (I didn't do any timings to back this up though). For PDict objects it was nice to be able to serialize them even though it's probably not something the user should do. Turning a DNAStringSet object into a PDict object is very fast and the resulting object is so big that a save/load cycle would actually take much longer than re-processing the PDict object at each new session. Also my feeling that the time and effort required to break from the Kent would perhaps be better spent trying to implement something new like the Nested Containment List algo. Since this would probably have to be implemented from scratch anyway then it would make sense to use SEXP-based memory, or even better, to put a thin abstract layer between the algo itself and memory management so they are decoupled. Cheers, H.
On Tue, Jul 1, 2014 at 9:05 AM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi Hector, Michael,
On 07/01/2014 05:57 AM, Michael Lawrence wrote:
It seems tough to make this work. There is no way for the R
serialization
machinery to understand what needs to be serialized after the
external
pointer. The easiest approach to fixing this would be to reimplement
everything on top of SEXPs, which is to say, it would not be easy.
This is what I did with PDict objects to store the Aho-Corasick tree.
It's actually easier than it sounds. You can use any atomic type, say
INTSXP or RAWSXP, it doesn't matter, That's just a way to get memory.
Then you do what you want with it (thru casting the pointer to it).
It not only solves the serialization problem, it also automatically
manages the memory, which is now in the hands of the garbage collector.
Cheers,
H.
Alternatively, we could write our own serializer. It seems R
needs a way to
register (de)serializers for external pointers.
On Tue, Jul 1, 2014 at 5:37 AM, Hector Corrada Bravo
<hcorrada at gmail.com <mailto:hcorrada at gmail.com>>
wrote:
Confirmed. Will look into it now.
Thanks for writing!
Hector
On Tue, Jul 1, 2014 at 2:40 AM, Kristoffer Vitting-Seerup <
kristoffer.vittingseerup at bio.__ku.dk
<mailto:kristoffer.vittingseerup at bio.ku.dk>> wrote:
Hi bioc-devel
I???ve fond an error in the usage of GIntervalTree:
test <- GRanges(seqnames='Chr1',
range=IRanges(start=10,end=20)__)
test
GRanges with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] Chr1 [10, 20] *
this object I can save and load without problem:
save(test, file='test.Rdata')
rm(test)
load('test.Rdata')
test
GRanges with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] Chr1 [10, 20] *
But if I convert to to a GIntervalTree (for faster
overlap finding) I get
a fatal error when loading:
test2 <- GIntervalTree(test)
test2
GIntervalTree with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] Chr1 [10, 20] *
save(test2, file='test2.Rdata')
rm(test2)
load('test2.Rdata')
test2
GIntervalTree with 1 range and 0 metadata columns:
*** caught segfault ***
address 0xc, cause 'memory not mapped'
Traceback:
1: .Call(.NAME, ..., PACKAGE = PACKAGE)
2: .Call2(fun, object at ptr, ..., PACKAGE = "IRanges")
3: .IntervalForestCall(from, "asIRanges")
4: asMethod(object)
5: as(x at ranges, "IRanges")
6: .GT_reorderValue(x, as(x at ranges, "IRanges"))
7: .local(x, ...)
8: ranges(x)
9: ranges(x)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
My session info:
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] grDevices datasets grid parallel stats
graphics utils
methods base
other attached packages:
[1] spliceR_1.5.0 plyr_1.8.1
RColorBrewer_1.0-5
VennDiagram_1.6.5 cummeRbund_2.7.1 Gviz_1.9.4
rtracklayer_1.25.8 GenomicRanges_1.17.14
GenomeInfoDb_1.1.5
IRanges_1.99.13
[11] S4Vectors_0.0.6 fastcluster_1.1.13
reshape2_1.4
ggplot2_0.9.3.1 RSQLite_0.11.4 DBI_0.2-7
BiocGenerics_0.11.2
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.27.6 BBmisc_1.6
BSgenome_1.33.5
BatchJobs_1.2 Biobase_2.25.0
BiocParallel_0.7.0
Biostrings_2.33.8 Formula_1.1-1
GenomicAlignments_1.1.10
[10] GenomicFeatures_1.17.6 Hmisc_3.14-4
MASS_7.3-33
R.methodsS3_1.6.1 RCurl_1.95-4.1
Rcpp_0.11.1
Rsamtools_1.17.14 VariantAnnotation_1.11.5
XML_3.98-1.1
[19] XVector_0.5.6 biomaRt_2.21.0
biovizBase_1.13.7
bitops_1.0-6 brew_1.0-6
cluster_1.15.2
codetools_0.2-8 colorspace_1.2-4
dichromat_2.0-0
[28] digest_0.6.4 fail_1.2
foreach_1.4.2
gtable_0.1.2 iterators_1.0.7
lattice_0.20-29
latticeExtra_0.6-26 matrixStats_0.8.14
munsell_0.4.2
[37] proto_0.3-10 scales_0.2.4
sendmailR_1.1-2
splines_3.1.0 stats4_3.1.0
stringr_0.6.2
survival_2.37-7 tools_3.1.0
zlibbioc_1.11.1
--
Kindest regards
Kristoffer Vitting-Seerup, cand.scient. (M.Sc.),
Ph.D Fellow
Sandelin Group
Bioinformatics Centre | Biotech Research & Innovation
Centre (BRIC), Dep.
Of Biology
University of Copenhagen
Building 1, 3th floor, office 3 (1-3-03)
Ole Maal??es Vej 5
DK-2200 Copenhagen N
Denmark
http://binf.ku.dk | http://www.bric.ku.dk
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319