Skip to content

[Bioc-devel] FastqStreamer error in function context

2 messages · Thomas Girke, Martin Morgan

#
When FastqStreamer or FastqSampler are called within another function in
combination with a writeFastq step then this usually returns an error.
However, the same code runs just fine outside of a function.  Below is
an example to reproduce this error. 

A small feature request for FastqStreamer would be an option to return 
the total number of reads stored in a fastq file as well as an option
for accessing specific records by passing on an index vector. 

Best,

Thomas


Here is an example:

library(ShortRead)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")

## Some function using FastqStreamer
test <- function(x=fl) {
        f <- FastqStreamer(x, 5)
        while (length(fq <- yield(f))) {
                fqsub <- fq[1:2]
                writeFastq(fqsub, "test.fastq", mode="a")
        }
        close(f)
}
test(x=fl)

Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : 
  subscript contains NAs or out of bounds indices


sessionInfo()
R version 2.15.0 alpha (2012-03-05 r58604)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ShortRead_1.14.3    latticeExtra_0.6-19 RColorBrewer_1.0-5 
[4] Rsamtools_1.8.4     lattice_0.20-6      Biostrings_2.24.1  
[7] GenomicRanges_1.8.4 IRanges_1.14.2      BiocGenerics_0.2.0 

loaded via a namespace (and not attached):
[1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.0    hwriter_1.3    stats4_2.15.0 
[6] tools_2.15.0   zlibbioc_1.2.0
#
On 05/09/2012 09:53 PM, Thomas Girke wrote:
Hi Thomas --

The example below fails because there are 256 records in the file, so 
for me the 52nd yield() returns length(fq) == 1 and the subset '2' is 
out of bounds. But maybe there is another example?
For the first part, after the fact we have

 > f
class: FastqStreamer
file: s_1_sequence.txt
status: n=5 current=1 added=256 total=256

with 'total=256' indicating that the streamer iterated over (i.e., the 
file had) 256 records. This is actually accessible in the reference 
class using the not-really-public (see the last lines of 
example(FastqStreamer)) accessor

 > f$status()
       n current   added   total
       5       1     256     256

which is a named integer vector. Is this what you were looking for?

I'll give the idea about selecting specific records some thought; I see 
how it could be useful.

Martin