Hi Michael, I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2 (devel). Both should become available via biocLite() on Friday around 11am (Seattle time). Thanks for your patience. Cheers, H.
On 04/30/2015 10:50 AM, Herv? Pag?s wrote:
Hi Michael, Thanks for the reminder. I must confess this slipped out of my radar. I'll look into it today and will let you know. H. On 04/30/2015 08:42 AM, Michael Stadler wrote:
Hi Herve, I stumbled again over the 'memory not mapped' issue in trimLRpatterns using updated versions of R and BioC-devel. I guess it does not hit people very often, but I would highly appreciate if it could be fixed. Many thanks, Michael PS: I can reproduce the issue using the code below under: R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ShortRead_1.27.1 GenomicAlignments_1.5.1 Rsamtools_1.21.3 [4] GenomicRanges_1.21.5 GenomeInfoDb_1.5.1 BiocParallel_1.3.4 [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6 [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2 loaded via a namespace (and not attached): [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0 [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2 [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7 [10] Biobase_2.29.0 On 25.04.2014 13:11, Herv? Pag?s wrote:
Hi Michael, Thanks for the report. I'll look into this. H. On 04/22/2014 08:29 AM, Michael Stadler wrote:
Dear Herve,
We are hitting a 'memory not mapped' problem when using trimLRpatterns
as detailed below. I did not manage to reproduce it with few sequences,
so I have to refer to a publicly available sequence file with many
reads. Even then, it occasionally runs through without problems.
Also, our use-case may not be typical and be part of the problem -
maybe
the solution will be to change our use of trimLRpatterns.
Here is some code to illustrate/reproduce the problem:
library(Biostrings)
library(ShortRead)
Rpat <- "TGGAATTCTCGGGTGCCAAGG"
maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))
fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
"AAAATGGAATTCTCGGGTGCCAAGGTTTT"))
# the second read is not trimmed because it runs through the adaptor
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
# A DNAStringSet instance of length 2
# width seq
#[1] 4 AAAA
#[2] 28 AAAATGGAATTCTCGGGTGCCAAGGTTT
# as a workaround, we pad the adaptor with Ns and
# increase the mismatch tolerance
numNs <- 90
maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")
# now, also the second read gets trimmed
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
# A DNAStringSet instance of length 2
# width seq
#[1] 4 AAAA
#[2] 4 AAAA
# to trigger the segmentation fault, many reads are needed
download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz",
"ERR000916_1.fastq.gz")
fq2 <- readFastq("ERR000916_1.fastq.gz")
fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
# *** caught segfault ***
#address 0x7f5109be4fed, cause 'memory not mapped'
#
#Traceback:
# 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
# 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type,
auto.reduce.pattern, PACKAGE = "Biostrings")
# 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
auto.reduce.pattern)
# 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
Rfixed)
# 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
Rfixed, ranges)
# 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
ranges
= TRUE)
# 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
ranges
= TRUE)
#10: eval(expr, envir, enclos)
#11: eval(call, sys.frame(sys.parent()))
#12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges =
TRUE)
#13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
maxRmm, with.Rindels = FALSE)
#14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
maxRmm, with.Rindels = FALSE)
The problem did not occur in R 3.0.3 with BioC 2.13.
Do you have an idea what's wrong?
Thanks for your help,
Michael
sessionInfo()R version 3.1.0 (2014-04-10)
#Platform: x86_64-unknown-linux-gnu (64-bit)
#
#locale:
#[1] C
#
#attached base packages:
#[1] parallel stats graphics grDevices utils datasets
methods
#[8] base
#
#other attached packages:
# [1] ShortRead_1.22.0 GenomicAlignments_1.0.0 BSgenome_1.32.0
# [4] Rsamtools_1.16.0 GenomicRanges_1.16.0
GenomeInfoDb_1.0.1
# [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0
#[10] IRanges_1.22.2 BiocGenerics_0.10.0
RColorBrewer_1.0-5
#
#loaded via a namespace (and not attached):
# [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0
# [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1
# [7] bitops_1.0-6 brew_1.0-6 codetools_0.2-8
#[10] compiler_3.1.0 digest_0.6.4 fail_1.2
#[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3
#[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26
#[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0
#[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319