Skip to content
Prev 4675 / 21312 Next

[Bioc-devel] [BioC] readGappedAlignments and param argument

On 09/10/2013 12:22 AM, Elena Grassi wrote:
You can use summarizeOverlaps with a 'BamFileList' created by something like

   myFiles = dir("/some/dir", pattern="bam$")
   bfl = BamFileList(myFiles, yieldSize=1000000)
   olaps = summarizeOverlaps(features, bfl)

see the example on the help page

   method?"summarizeOverlaps,GRanges,BamFileList"

(there's helpful tab completion after 'method?"sum<tab>'). If you've also said

   library(parallel)
   options(mc.cores=detectCores())  ## how many cores to use?

then the bam files are processed in parallel (choose yieldSize so that the 
different threads are not competing for memory).
more generally a strategy is to use yieldSize and iterate through the file, like

   bf = BamFile(fl, yieldsSize=1000000)
   open(bf)
   repeat {
       curr = readBamGappedAlignments(bf) ## read up to yieldSize records
       if (length(curr) == 0)
           break ## done
       ## process chunk 'curr'
   }
   close(bf)

this is actually what summarizeOverlaps does when invoked on a BamFile (or 
BamFileList().
one small thing that came out of that thread was that

   as(seqinfo(BamFile("/some/file")), "GRanges")

gives a GRanges of all the sequence lengths.