Skip to content
Prev 10363 / 21312 Next

[Bioc-devel] GAlignments Sorting Causes C Stack Error

Hi,

No ordering was defined until now for GAlignments. Starting with
GAlignments 1.11.7, the elements of a GAlignments object 'x' are
ordered based on the order of the elements in 'granges(x)', that
is, by chromosome, then by strand, then by start, then by end.

Note that I chose a simple ordering definition for GAlignments objects
but it has the following important caveat: == does NOT look at the
cigar. That means that 2 elements in a GAlignments object are considered
equal even if their cigars are different:

 > gal <- GAlignments(Rle(factor("chr1"), 2), c(5L, 5L), c("10M", 
"5M2I5M"), Rle(strand("+"), 2))

 > gal
GAlignments object with 2 alignments and 0 metadata columns:
       seqnames strand       cigar    qwidth     start       end     width
          <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
   [1]     chr1      +         10M        10         5        14        10
   [2]     chr1      +      5M2I5M        12         5        14        10
           njunc
       <integer>
   [1]         0
   [2]         0
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

 > gal[1] == gal
[1] TRUE TRUE

Another ordering definition would be an ordering that looks at
chromosome/strand/start/end/cigar instead of just 
chromosome/strand/start/end (i.e. the cigar is used to break ties when 
looking at
chromosome/strand/start/end only leads to a tie). That would address
the above caveat but it feels weird to use lexicographic ordering of
the cigars in order to break ties. So I'm kind of hesitant to do this.

Anyway now <=, <, ==, !=, pcompare(), order(), sort(), rank(), etc...
work.

Still no match(), selfmatch(), duplicated(), or unique() for GAlignments
objects. Note that these things should adhere to the same notion of
equality as == so they would also ignore the cigar right now if we
wanted to have them. Maybe that's another argument in favor of an
ordering based on chromosome/strand/start/end/cigar.

H.
On 01/08/2017 05:00 PM, Dario Strbenac wrote: