Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation and GRanges for inversions and copy number changes, but I don't know what would be the recommended way to store them all together using standard Bioconductor objects. And actually, is there any package that would load a SV VCF by lumpy or delly and build that object? Thanks! Bernat
[Bioc-devel] InteractionSet for structural variants
9 messages · Bernat Gel, Aaron Lun, Sean Davis +3 more
I would say that it depends on what operations you intend to perform on them. You can _store_ things any way you like, but the trick is to ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are meaningful common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the two bits that get stuck together) and so are inherently two-dimensional. A lot of useful operations will be truly translocation-specific, e.g., calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on the linear genome have no single equivalent in the two-dimensional case. So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with "translocation" (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as needed. -A
On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation and GRanges for inversions and copy number changes, but I don't know what would be the recommended way to store them all together using standard Bioconductor objects. And actually, is there any package that would load a SV VCF by lumpy or delly and build that object? Thanks! Bernat
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
2 days later
Hi Aaron, Thanks for your response. So far my intention is to to plot them and I do not intend on performing any other operation. The first step would be read in the VCF file and transform it into a meaningful object and I was hoping there was a core package already taking care of that, but I get from your answer that there's no such functionality implemented. Thanks again Bernat El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to perform on them. You can _store_ things any way you like, but the trick is to ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are meaningful common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the two bits that get stuck together) and so are inherently two-dimensional. A lot of useful operations will be truly translocation-specific, e.g., calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on the linear genome have no single equivalent in the two-dimensional case. So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with "translocation" (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as needed. -A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation and GRanges for inversions and copy number changes, but I don't know what would be the recommended way to store them all together using standard Bioconductor objects. And actually, is there any package that would load a SV VCF by lumpy or delly and build that object? Thanks! Bernat
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Thanks for your response. So far my intention is to to plot them and I do not intend on performing any other operation. The first step would be read in the VCF file and transform it into a meaningful object and I was hoping there was a core package already taking care of that, but I get from your answer that there's no such functionality implemented.
Not to my knowledge... but if you're planning on writing some relevant functions, I'm sure we could find a home for it somewhere. -A
El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to perform on them. You can _store_ things any way you like, but the trick is to ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are meaningful common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the two bits that get stuck together) and so are inherently two-dimensional. A lot of useful operations will be truly translocation-specific, e.g., calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on the linear genome have no single equivalent in the two-dimensional case. So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with "translocation" (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as needed. -A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation and GRanges for inversions and copy number changes, but I don't know what would be the recommended way to store them all together using standard Bioconductor objects. And actually, is there any package that would load a SV VCF by lumpy or delly and build that object? Thanks! Bernat
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Tue, May 21, 2019 at 2:54 AM Aaron Lun <
infinite.monkeys.with.keyboards at gmail.com> wrote:
Thanks for your response. So far my intention is to to plot them and I do not intend on performing any other operation. The first step would be read in the VCF file and transform it into a meaningful object and I was hoping there was a core package already taking care of that, but I get from your answer that there's no such functionality implemented.
Not to my knowledge... but if you're planning on writing some relevant functions, I'm sure we could find a home for it somewhere.
I do have a couple of simple functions in VCFWrenchR (not in Bioc), but like much VCF code, it probably misses a bunch of edge cases. The functions target VRanges, not interactionsets. https://github.com/seandavi/VCFWrenchR Sean
-A
El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to perform on them. You can _store_ things any way you like, but the trick is to ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are meaningful common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the two bits that get stuck together) and so are inherently two-dimensional. A lot of useful operations will be truly translocation-specific, e.g., calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on the linear genome have no single equivalent in the two-dimensional case. So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with "translocation" (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as needed. -A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation and GRanges for inversions and copy number changes, but I don't know what would be the recommended way to store them all together using standard Bioconductor objects. And actually, is there any package that would load a SV VCF by lumpy or delly and build that object? Thanks! Bernat
The new package StructuralVariantAnnotation is worth mentioning. It operates on the general "breakend" notation so should be able to represent any type of structural variant.
On Tue, May 21, 2019 at 3:22 AM Sean Davis <seandavi at gmail.com> wrote:
On Tue, May 21, 2019 at 2:54 AM Aaron Lun < infinite.monkeys.with.keyboards at gmail.com> wrote:
Thanks for your response. So far my intention is to to plot them and I do not intend on performing any other operation. The first step would
be
read in the VCF file and transform it into a meaningful object and I
was
hoping there was a core package already taking care of that, but I get from your answer that there's no such functionality implemented.
Not to my knowledge... but if you're planning on writing some relevant functions, I'm sure we could find a home for it somewhere.
I do have a couple of simple functions in VCFWrenchR (not in Bioc), but like much VCF code, it probably misses a bunch of edge cases. The functions target VRanges, not interactionsets. https://github.com/seandavi/VCFWrenchR Sean
-A
El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to perform on them. You can _store_ things any way you like, but the trick is to ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are meaningful common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the two bits that get stuck together) and so are inherently two-dimensional. A lot of useful operations will be truly translocation-specific, e.g., calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on the linear genome have no single equivalent in the two-dimensional case. So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with "translocation" (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as needed. -A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation
and
GRanges for inversions and copy number changes, but I don't know what would be the recommended way to store them all together using
standard
Bioconductor objects. And actually, is there any package that would load a SV VCF by lumpy
or
delly and build that object? Thanks! Bernat
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech <https://www.gene.com/>, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com <lastname.firstname-or-unix at gene.com> Join Genentech on LinkedIn <https://www.linkedin.com/company/genentech> | Twitter <https://twitter.com/genentech?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor> | Facebook <https://www.facebook.com/Genentech/> | Instagram <https://www.instagram.com/genentech/?hl=en> | YouTube <https://www.youtube.com/genentech> [[alternative HTML version deleted]]
I know little about SV and the associated software, but it is clear to me that we will see a lot of "personal" genomes in the future and having the ability to move between different reference genomes (coordinate systems) will be something I think we should think about having good/great support for. On Tue, May 21, 2019 at 9:37 AM Michael Lawrence via Bioc-devel <
bioc-devel at r-project.org> wrote:
The new package StructuralVariantAnnotation is worth mentioning. It operates on the general "breakend" notation so should be able to represent any type of structural variant. On Tue, May 21, 2019 at 3:22 AM Sean Davis <seandavi at gmail.com> wrote:
On Tue, May 21, 2019 at 2:54 AM Aaron Lun < infinite.monkeys.with.keyboards at gmail.com> wrote:
Thanks for your response. So far my intention is to to plot them and
I
do not intend on performing any other operation. The first step would
be
read in the VCF file and transform it into a meaningful object and I
was
hoping there was a core package already taking care of that, but I
get
from your answer that there's no such functionality implemented.
Not to my knowledge... but if you're planning on writing some relevant functions, I'm sure we could find a home for it somewhere.
I do have a couple of simple functions in VCFWrenchR (not in Bioc), but like much VCF code, it probably misses a bunch of edge cases. The
functions
target VRanges, not interactionsets. https://github.com/seandavi/VCFWrenchR Sean
-A
El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to perform on them. You can _store_ things any way you like, but the trick is
to
ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are
meaningful
common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the
two
bits that get stuck together) and so are inherently
two-dimensional. A
lot of useful operations will be truly translocation-specific, e.g., calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on the linear genome have no single equivalent in the two-dimensional case. So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with "translocation" (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as
needed.
-A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic structural variants? I think InteractionSet would work fine for translocation
and
GRanges for inversions and copy number changes, but I don't know
what
would be the recommended way to store them all together using
standard
Bioconductor objects. And actually, is there any package that would load a SV VCF by
lumpy
or
delly and build that object? Thanks! Bernat
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech <https://www.gene.com/>, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com <lastname.firstname-or-unix at gene.com> Join Genentech on LinkedIn <https://www.linkedin.com/company/genentech> | Twitter < https://twitter.com/genentech?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor
| Facebook <https://www.facebook.com/Genentech/> | Instagram <https://www.instagram.com/genentech/?hl=en> | YouTube <https://www.youtube.com/genentech> [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
We're never going to have great support for genome coordinate systems for SV due to the intrinsic complexity involved. Even something as simple as a deletion is problematic if the coordinate system change results in additional sequence in the deleted region (is that also deleted or not?), or some of the spanned/end-point sequence gets moved to a different chromosome. Sure, you can make approximations but it's never going to be a good/great results. To get a good result you'd need to rerun your SV calling pipeline in your new coordinate system. A lot more effort, but I'd have at least some confidence in the results. On Wed, May 22, 2019 at 12:07 AM Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
I know little about SV and the associated software, but it is clear to me that we will see a lot of "personal" genomes in the future and having the ability to move between different reference genomes (coordinate systems) will be something I think we should think about having good/great support for. On Tue, May 21, 2019 at 9:37 AM Michael Lawrence via Bioc-devel < bioc-devel at r-project.org> wrote:
The new package StructuralVariantAnnotation is worth mentioning. It operates on the general "breakend" notation so should be able to
represent
any type of structural variant. On Tue, May 21, 2019 at 3:22 AM Sean Davis <seandavi at gmail.com> wrote:
On Tue, May 21, 2019 at 2:54 AM Aaron Lun < infinite.monkeys.with.keyboards at gmail.com> wrote:
Thanks for your response. So far my intention is to to plot them
and
I
do not intend on performing any other operation. The first step
would
be
read in the VCF file and transform it into a meaningful object and
I
was
hoping there was a core package already taking care of that, but I
get
from your answer that there's no such functionality implemented.
Not to my knowledge... but if you're planning on writing some
relevant
functions, I'm sure we could find a home for it somewhere.
I do have a couple of simple functions in VCFWrenchR (not in Bioc), but like much VCF code, it probably misses a bunch of edge cases. The
functions
target VRanges, not interactionsets. https://github.com/seandavi/VCFWrenchR Sean
-A
El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to
perform
on them. You can _store_ things any way you like, but the trick is
to
ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are
meaningful
common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e., the
two
bits that get stuck together) and so are inherently
two-dimensional. A
lot of useful operations will be truly translocation-specific,
e.g.,
calculation of distances between anchor regions, identification of bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on
the
linear genome have no single equivalent in the two-dimensional
case.
So, I would be inclined to store them separately. If you must keep them in one object, just lump them into a List with
"translocation"
(GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) elements, and people/programs can pull out bits and pieces as
needed.
-A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic
structural
variants? I think InteractionSet would work fine for
translocation
and
GRanges for inversions and copy number changes, but I don't know
what
would be the recommended way to store them all together using
standard
Bioconductor objects. And actually, is there any package that would load a SV VCF by
lumpy
or
delly and build that object? Thanks! Bernat
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech <https://www.gene.com/>, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com <lastname.firstname-or-unix at gene.com> Join Genentech on LinkedIn <https://www.linkedin.com/company/genentech>
|
Twitter <
| Facebook <https://www.facebook.com/Genentech/> | Instagram <https://www.instagram.com/genentech/?hl=en> | YouTube <https://www.youtube.com/genentech> [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This is confusing the SV calling with the issue of moving between coordinate systems which exists even with great/perfect genomes. Good point on deletions and the new system, but it is nevertheless important People do this for example for some of the new mouse genomes using modmap. On Tue, May 21, 2019 at 10:25 AM Daniel Cameron <daniel at danielcameron.com> wrote:
We're never going to have great support for genome coordinate systems for SV due to the intrinsic complexity involved. Even something as simple as a deletion is problematic if the coordinate system change results in additional sequence in the deleted region (is that also deleted or not?), or some of the spanned/end-point sequence gets moved to a different chromosome. Sure, you can make approximations but it's never going to be a good/great results. To get a good result you'd need to rerun your SV calling pipeline in your new coordinate system. A lot more effort, but I'd have at least some confidence in the results. On Wed, May 22, 2019 at 12:07 AM Kasper Daniel Hansen < kasperdanielhansen at gmail.com> wrote:
I know little about SV and the associated software, but it is clear to me that we will see a lot of "personal" genomes in the future and having the ability to move between different reference genomes (coordinate systems) will be something I think we should think about having good/great support for. On Tue, May 21, 2019 at 9:37 AM Michael Lawrence via Bioc-devel < bioc-devel at r-project.org> wrote:
The new package StructuralVariantAnnotation is worth mentioning. It operates on the general "breakend" notation so should be able to
represent
any type of structural variant. On Tue, May 21, 2019 at 3:22 AM Sean Davis <seandavi at gmail.com> wrote:
On Tue, May 21, 2019 at 2:54 AM Aaron Lun < infinite.monkeys.with.keyboards at gmail.com> wrote:
Thanks for your response. So far my intention is to to plot them
and
I
do not intend on performing any other operation. The first step
would
be
read in the VCF file and transform it into a meaningful object
and I
was
hoping there was a core package already taking care of that, but I
get
from your answer that there's no such functionality implemented.
Not to my knowledge... but if you're planning on writing some
relevant
functions, I'm sure we could find a home for it somewhere.
I do have a couple of simple functions in VCFWrenchR (not in Bioc),
but
like much VCF code, it probably misses a bunch of edge cases. The
functions
target VRanges, not interactionsets. https://github.com/seandavi/VCFWrenchR Sean
-A
El 5/18/19 a las 4:47 AM, Aaron Lun escribi?:
I would say that it depends on what operations you intend to
perform
on them. You can _store_ things any way you like, but the trick
is
to
ensure that operations and manipulations on those things are consistent and meaningful. It is not obvious that there are
meaningful
common operations that one might want to apply to all structural variants. For example, translocations involve two genomic regions (i.e.,
the
two
bits that get stuck together) and so are inherently
two-dimensional. A
lot of useful operations will be truly translocation-specific,
e.g.,
calculation of distances between anchor regions, identification
of
bounding boxes in two-dimensional space. These operations will be meaningless to 1-dimensional variants on the linear genome, e.g., CNVs, inversions. The converse also applies where operations on
the
linear genome have no single equivalent in the two-dimensional
case.
So, I would be inclined to store them separately. If you must
keep
them in one object, just lump them into a List with
"translocation"
(GInteractions), "cnv" (GRanges) and "inversion" (another
GRanges)
elements, and people/programs can pull out bits and pieces as
needed.
-A On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
Hi all, Is there any standard recommended container for genomic
structural
variants? I think InteractionSet would work fine for
translocation
and
GRanges for inversions and copy number changes, but I don't know
what
would be the recommended way to store them all together using
standard
Bioconductor objects. And actually, is there any package that would load a SV VCF by
lumpy
or
delly and build that object? Thanks! Bernat
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech <https://www.gene.com/>, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com <lastname.firstname-or-unix at gene.com> Join Genentech on LinkedIn <https://www.linkedin.com/company/genentech>
|
Twitter <
| Facebook <https://www.facebook.com/Genentech/> | Instagram <https://www.instagram.com/genentech/?hl=en> | YouTube <https://www.youtube.com/genentech> [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel