[Bioc-devel] Best object structure for representing a pairwise genome alignment ?
Starting from
PairwiseAlignments-class package:Biostrings R Documentation
PairwiseAlignments, PairwiseAlignmentsSingleSubject, and
PairwiseAlignmentsSingleSubjectSummary objects
Description:
The ?PairwiseAlignments? class is a container for storing a set of
pairwise alignments.
The ?PairwiseAlignmentsSingleSubject? class is a container for
storing a set of pairwise alignments with a single subject.
The ?PairwiseAlignmentsSingleSubjectSummary? class is a container
for storing the summary of a set of pairwise alignments.
Usage:
## Constructors:
## When subject is missing, pattern must be of length 2
## S4 method for signature 'XString,XString'
PairwiseAlignments(pattern, subject,
type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1)
## S4 method for signature 'XStringSet,missing'
PairwiseAlignments(pattern, subject,
type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1)
## S4 method for signature 'character,character'
PairwiseAlignments(pattern, subject,
type = "global", substitutionMatrix = NULL, gapOpening = 0,
gapExtension = 1,
baseClass = "BString")
...
my question would be whether this is a relevant starting place? Clearly
the focus is not on coordinates, but perhaps a structure that maintains
genomic content and coordinates together would be of use?
On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy <charles.plessy at oist.jp>
wrote:
Dear Bioc developers,
I am currently analysing pairwise genome alignments with Bioconductor,
and I represent them with a GRanges object of the first genome,
containing one element by alignment block, and storing the coordinates
in the other genome in a metadata column containing another GRanges object.
Something like this.
GRanges object with 36582 ranges and 2 metadata columns:
seqnames ranges strand | score query
<Rle> <IRanges> <Rle> | <numeric> <GRanges>
[1] S1 162-550 + | 861 XSR:909374-909853
[2] S1 833-3738 + | 7238 XSR:910181-913291
[3] S1 3769-4212 + | 1165 XSR:913510-913953
[4] S1 4246-4381 + | 359 XSR:914134-914275
[5] S1 4532-5990 + | 2977 chr2:6694031-6695569
... ... ... ... . ... ...
[36578] S99 17228-17759 - | 793 chr1:2375870-2376379
[36579] S99 16417-16935 - | 632 chr1:2376612-2377077
[36580] S99 12370-12759 - | 773 chr1:2379949-2380343
[36581] S99 5270-5384 - | 295 chr1:843397-843511
[36582] S99 1949-3053 - | 2105 chr1:845358-846326
-------
Using "Pairwise genome alignment" as a keyword in a search engine, I
found that the packages CNEr is doing something similar, although it
uses a dedicated "GRangePairs" object for the purpose.
Before I start to invest time in either direction, I wanted to check on
that mailing list if there were other solutions already existing, in
particularly closer to the core packages ?
Have a nice day,
Charles
--
Charles Plessy - - ~ ~ ~ ~ ~ ~~~~ ~ ~ ~ ~ ~ - - charles.plessy at oist.jp
Okinawa Institute of Science and Technology Graduate University
Staff scientist in the Luscombe Unit - ~ - https://groups.oist.jp/grsu
Toots from work - ~ ~~ ~ - https://mastodon.technology/@charles_plessy
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
The information in this e-mail is intended only for the ...{{dropped:18}}