Skip to content
Prev 6989 / 21312 Next

[Bioc-devel] BamTallyParam argument 'which'

Related to my post on a separate thread
(https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html),
I think that if 'which' is not being reduced by default, a simple
example showing the effects of this could be included in the functions
that have such an argument. Also note that 'reducing' could lead to
unintended results.

For example, in the help page for GenomicAlignments::readGAlignments,
after the 'gal4' example it would be nice to add something like this:


## Note that if overlapping ranges are provided in 'which'
## reads could be selected more than once. This would artificually
## increase the coverage or affect other downstream results.
## If you 'reduce' the ranges, reads that originally overlapped
## two disjoint segments will be included.

which_dups <- RangesList(seq1=rep(IRanges(1000, 2000), 2),
    seq2=IRanges(c(100, 1000), c(1000, 2000)))
param_dups <- ScanBamParam(which=which_dups)
param_reduced <- ScanBamParam(which=reduce(which_dups))
gal4_dups <- readGAlignments(bamfile, param=param_dups)
gal4_reduced <- readGAlignments(bamfile, param=param_reduced)


length(gal4)

## Duplicates some reads. In this case, all the ones between
## bases 1000 and 2000 on seq1.
length(gal4_dups)

## Includes some reads that mapped around base 1000 in seq2
## that were excluded in gal4.
length(gal4_reduced)








Here's the output:
+                        mustWork=TRUE)
+                     seq2=IRanges(c(100, 1000), c(1000, 2000)))
GAlignments object with 2404 alignments and 0 metadata columns:
         seqnames strand       cigar    qwidth     start       end
width     njunc
            <Rle>  <Rle> <character> <integer> <integer> <integer>
<integer> <integer>
     [1]     seq1      +         35M        35       970      1004
   35         0
     [2]     seq1      +         35M        35       971      1005
   35         0
     [3]     seq1      +         35M        35       972      1006
   35         0
     [4]     seq1      +         35M        35       973      1007
   35         0
     [5]     seq1      +         35M        35       974      1008
   35         0
     ...      ...    ...         ...       ...       ...       ...
  ...       ...
  [2400]     seq2      +         35M        35      1524      1558
   35         0
  [2401]     seq2      +         35M        35      1524      1558
   35         0
  [2402]     seq2      -         35M        35      1528      1562
   35         0
  [2403]     seq2      -         35M        35      1532      1566
   35         0
  [2404]     seq2      -         35M        35      1533      1567
   35         0
  -------
  seqinfo: 2 sequences from an unspecified genome
+     seq2=IRanges(c(100, 1000), c(1000, 2000)))
[1] 2404
[1] 3014
[1] 2343
Session info-----------------------------------------------------------------------------------------------------------
 setting  value
 version  R Under development (unstable) (2014-11-01 r66923)
 system   x86_64, darwin10.8.0
 ui       AQUA
 language (EN)
 collate  en_US.UTF-8
 tz       America/New_York

Packages---------------------------------------------------------------------------------------------------------------
 package           * version date       source
 base64enc           0.1.2   2014-06-26 CRAN (R 3.2.0)
 BatchJobs           1.4     2014-09-24 CRAN (R 3.2.0)
 BBmisc              1.7     2014-06-21 CRAN (R 3.2.0)
 BiocGenerics      * 0.13.4  2014-12-31 Bioconductor
 BiocParallel        1.1.13  2015-01-27 Bioconductor
 Biostrings        * 2.35.8  2015-02-14 Bioconductor
 bitops              1.0.6   2013-08-17 CRAN (R 3.2.0)
 brew                1.0.6   2011-04-13 CRAN (R 3.2.0)
 checkmate           1.5.0   2014-10-19 CRAN (R 3.2.0)
 codetools           0.2.9   2014-08-21 CRAN (R 3.2.0)
 DBI                 0.3.1   2014-09-24 CRAN (R 3.2.0)
 devtools            1.6.1   2014-10-07 CRAN (R 3.2.0)
 digest              0.6.4   2013-12-03 CRAN (R 3.2.0)
 fail                1.2     2013-09-19 CRAN (R 3.2.0)
 foreach             1.4.2   2014-04-11 CRAN (R 3.2.0)
 GenomeInfoDb      * 1.3.13  2015-02-13 Bioconductor
 GenomicAlignments * 1.3.27  2015-01-26 Bioconductor
 GenomicRanges     * 1.19.37 2015-02-13 Bioconductor
 IRanges           * 2.1.38  2015-02-08 Bioconductor
 iterators           1.0.7   2014-04-11 CRAN (R 3.2.0)
 Rsamtools         * 1.19.27 2015-02-07 Bioconductor
 RSQLite             1.0.0   2014-10-25 CRAN (R 3.2.0)
 rstudioapi          0.1     2014-03-27 CRAN (R 3.2.0)
 S4Vectors         * 0.5.20  2015-02-19 Bioconductor
 sendmailR           1.2.1   2014-09-21 CRAN (R 3.2.0)
 stringr             0.6.2   2012-12-06 CRAN (R 3.2.0)
 XVector           * 0.7.4   2015-02-08 Bioconductor
 zlibbioc            1.13.1  2015-02-11 Bioconductor
On Mon, Feb 23, 2015 at 2:38 PM, Leonard Goldstein
<goldstein.leonard at gene.com> wrote: