Skip to content
Prev 7137 / 21307 Next

[Bioc-devel] zero-width ranges representing insertions

Using a GAlignments object to show 3 different representations of the
same insertion (three nucleotides (TTT) inserted between positions 10
and 11):

   library(GenomicAlignments)
   gal <- GAlignments(Rle("chr1", 3), pos=c(10L, 10L, 11L),
                      cigar=c("1M3I1M", "1M3I", "3I"),
                      strand=Rle(strand("+"), 3),
                      ref=c("-", "A", ""),
                      alt=c("TTT", "ATTT", "TTT"),
                      who=c("dbSNP/HGVS", "VariantAnnotation/VCF",
                            "XtraSNPlocs/UCSC"))

   > gal
   GAlignments object with 3 alignments and 3 metadata columns:
         seqnames strand       cigar    qwidth     start       end     width
            <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
     [1]     chr1      +      1M3I1M         5        10        11         2
     [2]     chr1      +        1M3I         4        10        10         1
     [3]     chr1      +          3I         3        11        10         0
             njunc |         ref         alt                   who
         <integer> | <character> <character>           <character>
     [1]         0 |           -         TTT            dbSNP/HGVS
     [2]         0 |           A        ATTT VariantAnnotation/VCF
     [3]         0 |                     TTT      XtraSNPlocs/UCSC
     -------
     seqinfo: 1 sequence from an unspecified genome; no seqlengths

In the context of a GAlignments object the alt allele can be seen
as the sequence of the query (SEQ field in BAM). So we expect the
number of nucleotides in the alt allele to match the qwidth:

   > nchar(mcols(gal)$alt) == qwidth(gal)
   [1] FALSE  TRUE  TRUE

Not true for the dbSNP/HGVS representation.

The range of the insertion on the reference is:

   > as(gal, "GRanges")
   GRanges object with 3 ranges and 3 metadata columns:
         seqnames    ranges strand |         ref         alt 
        who
            <Rle> <IRanges>  <Rle> | <character> <character> 
<character>
     [1]     chr1  [10, 11]      + |           -         TTT 
dbSNP/HGVS
     [2]     chr1  [10, 10]      + |           A        ATTT 
VariantAnnotation/VCF
     [3]     chr1  [11, 10]      + |                     TTT 
XtraSNPlocs/UCSC
     -------
     seqinfo: 1 sequence from an unspecified genome; no seqlengths

We expect its width to be the same as the nb of nucleotides in the
ref alleles:

   > width(gal) == nchar(mcols(gal)$ref)
   [1] FALSE  TRUE  TRUE

Again, not true for the dbSNP/HGVS representation.

H.
On 03/16/2015 03:12 PM, Robert Castelo wrote: