Skip to content

[Bioc-devel] Build error in released version

5 messages · Zhu, Lihua (Julie), Hervé Pagès, Kasper Daniel Hansen

#
Hi Julie,

The GAlignmentPairs container didn't support discordant strand until
BioC 3.4 (current devel). In the current release (and in previous
versions of BioC) strand discordance was not supported. I recently
fixed a bug in the released version of GenomicAlignments where the
strand() getter was returning an incorrect strand for pairs with
discordant strand, instead of raising an error. This fix is what
breaks the GUIDEseq vignette which creates and manipulates a
GAlignmentPairs object with discordant strand.

I just changed this again (in GenomicAlignments 1.8.4) so the
strand() getter now returns * instead of raising an error in case
of discordant strand (this is actually what strand() does in devel
where strand discordance is now fully supported). That seems to
fix GUIDEseq's vignette.

I guess there was no solid reason for not supporting strand discordance.
It was just a matter of deciding how the various GAlignmentPairs getters
and extractors should handle this. Having strand() or granges() return
* is probably the natural thing to do.

A more complicated situation, which is also much less common, is
chromosome discordance i.e. when the 2 alignments in a pair are on
different chromosomes. Not clear what seqnames() or granges() should
do in that case. Maybe return/set the seqname to * but that means
introducing * as a special seqlevel. Still need to think about the
implications of this.

Cheers,
H.
On 07/01/2016 02:23 PM, Zhu, Lihua (Julie) wrote:

  
    
1 day later
#
Dear Herve,

Thank you so much for the detailed explanation and quick fix!

When fusion occurs, the 2 alignments in a pair could be on different
chromosomes. Not sure what is the best way to handle this situation. If
set seqnames to *, then the mapping location is lost. Maybe set seqnames
to one of the chromosome, and have a mcol seqname2? Indicating chromosome
fusion?

Best,

Julie
On 7/4/16 12:28 AM, "Herv? Pag?s" <hpages at fredhutch.org> wrote:

            
#
Hi Julie,
On 07/05/2016 01:03 PM, Zhu, Lihua (Julie) wrote:
The exact mapping location is stored in the GAligmentPairs object and
will be preserved when the user coerces to GRangesList, either with
grglist(.) or as(. , "GRangesList"). I was thinking in using * only
when coercing to GRanges, either with granges(.) or as(. , "GRanges").
There is really no good way to represent features that span multiple
chromosomes with a GRanges object. These features are better represented
with a GRangesList object. That's really what GRangesList objects are
for.

So after giving it a 2nd thought, I'm tempted to either (a) just let
coercion of GAligmentPairs to GRanges fail (with an informative error
message) when some pairs span more than 1 chromosome, or (b) drop
these pairs with a warning, or (c) add an extra argument to granges()
to let the user decide between (a) and (b).

Hopefully that makes sense.

H.

  
    
#
Thanks for the clarification , Herv?!

That makes sense to me! I prefer option b and c.

Best,

Julie
#
This is my opinion.  I think the class should have a very general
structure.  Depending on the assay, users may be particularly interested in
the original case described by Herve (alignments on opposite strands, same
chromosome), but I dont think it should be enforced.

We have first(), last(), second() for getting the information in each pair.

I suggest that the functionality inherent in granges() instead by moved to
a function like
  getFragment()
What this function does, is really to get the actual DNA fragment that was
sequenced.  I think this function should have an argument like
standRequirement which defaults to "opposite" and which has other options
"same" and "ignored" and which returns a GRanges with the "outer"
coordinates exactly as now.  That accessor function would silently drop
alignment pairs which does not satisfy the criteria.

An alternative is that this could be set as options inside the
GAlignmentPairs object: what is considered a "valid" alignment.  Then this
could dictate the behaviour of say seqnames().

I strongly support this class having the possibility of alignments to
different chromosomes.  But I can also see the advantage in having default
methods drop alignments which does this, as this will be the majority
usecase.  Then people with special requirements can set a few arguments.

Kasper

On Tue, Jul 5, 2016 at 7:15 PM, Zhu, Lihua (Julie) <Julie.Zhu at umassmed.edu>
wrote: