Skip to content
Prev 5441 / 21312 Next

[Bioc-devel] GenomicAlignments: using asMates=TRUE and yieldSize with paired-end BAM files

Hi Herve,

I must retract my previous statement about 'yieldSize' and 'which'. As 
of Rsamtools 1.15.0, scanBam() (and functions that build on it) does 
handle the case where both are supplied. This is true for the non-mate 
and mate-pairing code.
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bf <- BamFile(fl, yieldSize=1000)
which <- tileGenome(seqlengths(bf),
      tilewidth=500, cut.last.tile.in.chrom=TRUE)
param <- ScanBamParam(which=which, what="qname")
FUN <- function(elt) length(elt[[1]])

Here we have both 'yieldSize' and a 'which' in the param. We ask for a 
yield of 1000 records. The first range only has 394, the second has 570 
and from the third we get 570. As explained in the man page snippit 
above, records are yielded in complete ranges whose sum first exceeds 
'yieldSize'. In range 3 we exceed the 1000 so we get all of range 3 then 
stop.

sapply(scanBam(bf, param=param), FUN)
We can open the file and yield through all records:

bf <- open(BamFile(fl, yieldSize=1000))
sapply(scanBam(bf, param=param), FUN)
I've removed the misinformation from the man pages I altered. Also added 
a unit test for the mates code with 'yieldSize' and 'which' in Rsamtools.

Val
On 03/27/2014 11:36 AM, Herv? Pag?s wrote: