I am writing a package that will extend the GenomicInteractions class. I am a statistician, so I may not know best practices when it comes to extending existing classes (eg. should I make a new slot or simply add a column to the `elementMetadata`? Are there existing functions that already do what I am attempting?).
I am not familiar with Bioc-Devel decorum, so if asking this here is inappropriate, kindly let me know.
About my project:
In the first step, I am hoping to implement a HiC binning function on HiC data contained in a GenomicInteractions set. I aim to:
- Reorder the anchor pairs (I will explain in more detail to anyone that wants to help)
- Collapse the regions to the desires bin width
- Sum the counts within each bin
- Update the anchors to agree with the new/updated regions
This will set the stage for the new class that I hope to create for HiC domain calling, but I need to achieve the above tasks first.
All the best to everyone!
? Luke Klein
PhD Student
Department of Statistics
University of California, Riverside
lklei001 at ucr.edu
[Bioc-devel] Call for collaborators/advice
5 messages · Luke Klein, Kasper Daniel Hansen, Tim Triche, Jr. +1 more
Why is this not "just" a function which transforms one GI into another GI? Thats what it seems to me.
On Fri, Mar 22, 2019 at 12:31 PM Luke Klein <lklei001 at ucr.edu> wrote:
I am writing a package that will extend the GenomicInteractions class. I
am a statistician, so I may not know best practices when it comes to
extending existing classes (eg. should I make a new slot or simply add a
column to the `elementMetadata`? Are there existing functions that already
do what I am attempting?).
I am not familiar with Bioc-Devel decorum, so if asking this here is
inappropriate, kindly let me know.
About my project:
In the first step, I am hoping to implement a HiC binning function on HiC
data contained in a GenomicInteractions set. I aim to:
- Reorder the anchor pairs (I will explain in more detail to
anyone that wants to help)
- Collapse the regions to the desires bin width
- Sum the counts within each bin
- Update the anchors to agree with the new/updated regions
This will set the stage for the new class that I hope to create for HiC
domain calling, but I need to achieve the above tasks first.
All the best to everyone!
? Luke Klein
PhD Student
Department of Statistics
University of California, Riverside
lklei001 at ucr.edu
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Maybe the question arises whether to store a reduced representation like, say, singleCellExperiment does? In any event, I agree that endomorphism will be expected by users and, like tximport collapsing many transcripts into each gene, the sensible thing to do is just to return a binned (less rows, same columns) version of the input GenomicInteractions object. (Also, if this were posted on support.bioconductor.org, Aaron Lun could answer, which is good because he probably already wrote functions to do this at some point.) --t
On Mar 22, 2019, at 2:44 PM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote: Why is this not "just" a function which transforms one GI into another GI? Thats what it seems to me.
On Fri, Mar 22, 2019 at 12:31 PM Luke Klein <lklei001 at ucr.edu> wrote:
I am writing a package that will extend the GenomicInteractions class. I
am a statistician, so I may not know best practices when it comes to
extending existing classes (eg. should I make a new slot or simply add a
column to the `elementMetadata`? Are there existing functions that already
do what I am attempting?).
I am not familiar with Bioc-Devel decorum, so if asking this here is
inappropriate, kindly let me know.
About my project:
In the first step, I am hoping to implement a HiC binning function on HiC
data contained in a GenomicInteractions set. I aim to:
- Reorder the anchor pairs (I will explain in more detail to
anyone that wants to help)
- Collapse the regions to the desires bin width
- Sum the counts within each bin
- Update the anchors to agree with the new/updated regions
This will set the stage for the new class that I hope to create for HiC
domain calling, but I need to achieve the above tasks first.
All the best to everyone!
? Luke Klein
PhD Student
Department of Statistics
University of California, Riverside
lklei001 at ucr.edu
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Luke, Do you mean bins or bin pairs? If you want to just bin the coverage in terms of the linear genome, there should be ways to do that outside of InteractionSet or GenomicInteractions. This is just dealing with standard genomic interval data; extract the anchor coordinates and plug it in elsewhere. If you want to collate region pairs into bin pairs; I don't know of a dedicated function to do this from a GInteractions object (diffHic only does this from raw read data). You'll need to figure out what to do to regions that cross bin boundaries. The simplest way to mimic this behaviour right now is to generate another GInteractions object containing ALL POSSIBLE bin pairs (use combn with a constant set of bin regions) and plug that into countOverlaps. This will generate loads of zeroes, though, so is not the most efficient way to do this. You could get a sparser form with linkOverlaps but this requires more work to get the counts. I have some more thoughts about the Bioconductor Hi-C infrastructure, but my laptop battery's running out and I left my charger in my new apartment. So that'll have to wait until tomorrow. -A
On 22/03/2019 09:31, Luke Klein wrote:
I am writing a package that will extend the GenomicInteractions class. ?I am a statistician, so I may not know best practices when it comes to extending existing classes (eg. should I make a new slot or simply add a column to the `elementMetadata`? ?Are there existing functions that already do what I am attempting?). I am not familiar with Bioc-Devel decorum, so if asking this here is inappropriate, kindly let me know. About my project: In the first step, I am hoping to implement a HiC binning function on HiC data contained in a GenomicInteractions set. ?I aim to: - Reorder the anchor pairs (I will explain in more detail to anyone that wants to help) - Collapse the regions to the desires bin width - Sum the counts within each bin - Update the anchors to agree with the new/updated regions This will set the stage for the new class that I hope to create for HiC domain calling, but I need to achieve the above tasks first. All the best to everyone! ?*Luke Klein* ? ? PhD Student ? ? Department of Statistics ? ? University of California, Riverside lklei001 at ucr.edu <mailto:lklei001 at ucr.edu>
2 days later
Power's back, so continuing on: The Bioconductor Hi-C infrastructure should probably be consolidated into packages with more clearly defined boundaries: 1) A package to define a base (virtual) "Interactions" class. This would basically have a constant "Vector" store with a "Hits" object specifying the pairwise interactions between elements in the constant store. One could also distinguish between "SelfInteractions" (constant store) and the more general "Interactions" (two stores, possibly of different types, e.g., genomic interval -> protein interactions). A variety of methods would be available here to do manipulations and such. 2) A package to define an "Interactions" subclass where the store is a genomic interval, with basic methods to operate on such classes. Methods such as findOverlaps(), linkOverlaps() and boundingBox() would probably go here. @Luke, a binning method could also conceivably go here. 3) A package to define the "InteractionSet" and "ContactMatrix" classes. Basically just the "InteractionSet" package with the "GInteractions" class stripped out and moved into (2). 4) Additional packages for higher-level analysis, e.g., diffHic. These won't need much change beyond fiddling with the Imports. So, (2) depends on (1), (3) depends on (2), and (4) depends on (3). (1) could either be S4Vectors itself, or we could take out the "Pairs" class from S4Vectors and put it into a separate package that provides data structures for interaction-esque thingies. @Liz, "GenomicInteractions" (the package) would be a natural home for the class/methods in (2). It would also resolve the confusion between the "GInteractions" class and "GenomicInteractions" (the class) by making these one thing. There are two obvious hurdles: - I'm not familiar with the requirements for the class specialization in "GenomicInteractions", but anything really custom would not belong in (2). - Any methods for specialized data analysis would need to go into another package for (4). I don't have a good definition of what is specialized; but if there's statistical inference, it shouldn't be in (2). All of this is open for discussion, if people are interested and willing to volunteer. These changes will not make the next release anyway. -A
On 22/03/2019 19:54, Aaron Lun wrote:
Hi Luke, Do you mean bins or bin pairs? If you want to just bin the coverage in terms of the linear genome, there should be ways to do that outside of InteractionSet or GenomicInteractions. This is just dealing with standard genomic interval data; extract the anchor coordinates and plug it in elsewhere. If you want to collate region pairs into bin pairs; I don't know of a dedicated function to do this from a GInteractions object (diffHic only does this from raw read data). You'll need to figure out what to do to regions that cross bin boundaries. The simplest way to mimic this behaviour right now is to generate another GInteractions object containing ALL POSSIBLE bin pairs (use combn with a constant set of bin regions) and plug that into countOverlaps. This will generate loads of zeroes, though, so is not the most efficient way to do this. You could get a sparser form with linkOverlaps but this requires more work to get the counts. I have some more thoughts about the Bioconductor Hi-C infrastructure, but my laptop battery's running out and I left my charger in my new apartment. So that'll have to wait until tomorrow. -A On 22/03/2019 09:31, Luke Klein wrote:
I am writing a package that will extend the GenomicInteractions class. ??I am a statistician, so I may not know best practices when it comes to extending existing classes (eg. should I make a new slot or simply add a column to the `elementMetadata`? ?Are there existing functions that already do what I am attempting?). I am not familiar with Bioc-Devel decorum, so if asking this here is inappropriate, kindly let me know. About my project: In the first step, I am hoping to implement a HiC binning function on HiC data contained in a GenomicInteractions set. ?I aim to: - Reorder the anchor pairs (I will explain in more detail to anyone that wants to help) - Collapse the regions to the desires bin width - Sum the counts within each bin - Update the anchors to agree with the new/updated regions This will set the stage for the new class that I hope to create for HiC domain calling, but I need to achieve the above tasks first. All the best to everyone! ?*Luke Klein* ?? ? PhD Student ?? ? Department of Statistics ?? ? University of California, Riverside lklei001 at ucr.edu <mailto:lklei001 at ucr.edu>