Skip to content

[Bioc-devel] InteractionSet for structural variants

9 messages · Bernat Gel, Aaron Lun, Sean Davis +3 more

#
Hi all,

Is there any standard recommended container for genomic structural 
variants? I think InteractionSet would work fine for translocation and 
GRanges for inversions and copy number changes, but I don't know what 
would be the recommended way to store them all together using standard 
Bioconductor objects.

And actually, is there any package that would load a SV VCF by lumpy or 
delly and build that object?

Thanks!

Bernat
#
I would say that it depends on what operations you intend to perform on 
them. You can _store_ things any way you like, but the trick is to 
ensure that operations and manipulations on those things are consistent 
and meaningful. It is not obvious that there are meaningful common 
operations that one might want to apply to all structural variants.

For example, translocations involve two genomic regions (i.e., the two 
bits that get stuck together) and so are inherently two-dimensional. A 
lot of useful operations will be truly translocation-specific, e.g., 
calculation of distances between anchor regions, identification of 
bounding boxes in two-dimensional space. These operations will be 
meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, 
inversions. The converse also applies where operations on the linear 
genome have no single equivalent in the two-dimensional case.

So, I would be inclined to store them separately. If you must keep them 
in one object, just lump them into a List with "translocation" 
(GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) 
elements, and people/programs can pull out bits and pieces as needed.

-A
On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
2 days later
#
Hi Aaron,

Thanks for your response. So far my intention is to to plot them and I 
do not intend on performing any other operation. The first step would be 
read in the VCF file and transform it into a meaningful object and I was 
hoping there was a core package already taking care of that, but I get 
from your answer that there's no such functionality implemented.

Thanks again

Bernat





El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
#
Not to my knowledge... but if you're planning on writing some relevant 
functions, I'm sure we could find a home for it somewhere.

-A
#
On Tue, May 21, 2019 at 2:54 AM Aaron Lun <
infinite.monkeys.with.keyboards at gmail.com> wrote:

            
I do have a couple of simple functions in VCFWrenchR (not in Bioc), but
like much VCF code, it probably misses a bunch of edge cases. The functions
target VRanges, not interactionsets.

https://github.com/seandavi/VCFWrenchR

Sean

  
  
#
The new package StructuralVariantAnnotation is worth mentioning. It
operates on the general "breakend" notation so should be able to represent
any type of structural variant.
On Tue, May 21, 2019 at 3:22 AM Sean Davis <seandavi at gmail.com> wrote:

            

  
    
#
I know little about SV and the associated software, but it is clear to me
that we will see a lot of "personal" genomes in the future and having the
ability to move between different reference genomes (coordinate systems)
will be something I think we should think about having good/great support
for.

On Tue, May 21, 2019 at 9:37 AM Michael Lawrence via Bioc-devel <
bioc-devel at r-project.org> wrote:

            

  
  
#
We're never going to have great support for genome coordinate systems for
SV due to the intrinsic complexity involved. Even something as simple as a
deletion is problematic if the coordinate system change results in
additional sequence in the deleted region (is that also deleted or not?),
or some of the spanned/end-point sequence gets moved to a different
chromosome. Sure, you can make approximations but it's never going to be a
good/great results. To get a good result you'd need to rerun your SV
calling pipeline in your new coordinate system. A lot more effort, but I'd
have at least some confidence in the results.

On Wed, May 22, 2019 at 12:07 AM Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:

            

  
  
#
This is confusing the SV calling with the issue of moving between
coordinate systems which exists even with great/perfect genomes. Good point
on deletions and the new system, but it is nevertheless important

People do this for example for some of the new mouse genomes using modmap.

On Tue, May 21, 2019 at 10:25 AM Daniel Cameron <daniel at danielcameron.com>
wrote: