Skip to content
Back to formatted view

Raw Message

Message-ID: <54DA4FEE.9000502@fredhutch.org>
Date: 2015-02-10T18:37:34Z
From: Valerie Obenchain
Subject: [Bioc-devel] VariantAnnotation::isDelins() ??
In-Reply-To: <7642_1423578678_54DA1636_7642_2465_1_54DA15DA.6040604@upf.edu>

Hi Robert,

This sounds like a good addition. I'll put it on the TODO. If you need 
this immediately I'd be happy to accept a patch (with unit tests).

Valerie



On 02/10/2015 06:29 AM, Robert Castelo wrote:
> hi,
>
> in the VariantAnnotation package, the help of the functions for
> identifying variant types such as SNVs, insertions,
> deletions, transitions, and structural rearrangements gives the
> following definitions:
>
>
>          ? isSNV: Reference and alternate alleles are both a single
>            nucleotide long.
>
>          ? isInsertion: Reference allele is a single nucleotide and the
>            alternate allele is greater (longer) than a single nucleotide
>            and the first nucleotide of the alternate allele matches the
>            reference.
>
>          ? isDeletion: Alternate allele is a single nucleotide and the
>            reference allele is greater (longer) than a single nucleotide
>            and the first nucleotide of the reference allele matches the
>            alternate.
>
>          ? isIndel: The variant is either a deletion or insertion as
>            determined by ?isDeletion? and ?isInsertion?.
>
>          ? isSubstition: Reference and alternate alleles are the same
>            length (1 or more nucleotides long).
>
>          ? isTransition: Reference and alternate alleles are both a
>            single nucleotide long.  The reference-alternate pair
>            interchange is of either two-ring purines (A <-> G) or
>            one-ring pyrimidines (C <-> T).
>
>
> however, unless I'm missing something here, these definitions do not
> cover the indels that involve the the insertion or deletion involving
> more than one, respectively, reference or alternate nucleotide. this
> could be an example of what i'm trying to say:
>
> library(VariantAnnotation)
>
> vr <- VRanges(seqnames = rep("chr1", times=5),
>                ranges = IRanges(seq(1, 10, by=20),
>                                 seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
>                ref = c("T", "A",  "A", "AC",  "AC"),
>                alt = c("C", "T", "AC", "AT", "ACC"),
>                refDepth = c(5, 10, 5, 10, 5),
>                altDepth = c(7, 6, 7, 6, 7),
>                totalDepth = c(12, 17, 12, 17, 12),
>                sampleNames = letters[1:5])
>
> isSNV(vr)
> ## [1]  TRUE  TRUE FALSE FALSE FALSE
> isIndel(vr)
> ## [1] FALSE FALSE  TRUE FALSE FALSE
> isSubstitution(vr)
> ## [1]  TRUE  TRUE FALSE  TRUE FALSE
>
> note that the last variant does not evaluate as true for any of the
> three possibilities. after looking for variant definitions, i have found
> that the Human Genome Variation Society (HGVS) describes this as a
> deletion followed by an insertion and calls it "indel" or delins" (it's
> unclear to me whether they use that interchangeably), see the link here:
>
> http://www.hgvs.org/mutnomen/recs-DNA.html#indel
>
> the only other site I could quickly find with Google, where some
> specific definition is given is the site of the software SnpEff, which
> calls it "MIXED", a "Multiple-nucleotide and an InDel":
>
> http://snpeff.sourceforge.net/SnpEff_manual.html
>
> I would suggest that VariantAnnotation should try to identify this type
> of variant. following the HGVS recommendations, could we maybe have a
> function for it called isDelins() ??
>
>
>
> cheers,
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel