Skip to content
Prev 7169 / 21307 Next

[Bioc-devel] zero-width ranges representing insertions

On 03/16/2015 05:31 PM, Herv? Pag?s wrote:
I've had a chance to take a closer look at how VA handles zero-width ranges.

Previously, both predictCoding() and locateVariants() treated zero-width 
ranges as width 1 (start decremented to equal end). In VA 1.13.42 this 
has been changed for predictCoding() so now zero width are dropped. The 
function internals expect REF and ALT to conform with the vcf specs and 
zero width ranges aren't used. So, it seemed wise to drop the zero-width 
for now.

locateVariants() remains the same because this is more general. I think 
it's still useful to identify where a zero width range falls with 
respect to gene features.
This output is actually fine. The VARCODON values may be slightly 
misleading but the data are correct. predictCoding() only computes amino 
acid sequences for snps or indels that conform to the 'groups of 3' 
idea. The substitution or deletion must result in the sequence being 
divisible by 3 otherwise there is a partial codon at the end that must 
be inferred (consider all possible combinations) and then one must be 
chosen (consensus). The code does not currently do this and I'm not sure 
there is common agreement on how to do it.

This GRanges has a snv followed by 1, 2, and 3 base pair insertions:
Coding changes are computed for the snv and 3bp insertion but the others 
are marked as 'frameshift'. Previously when an indel couldn't be 
translated the VARCODON was the same as the REFCODON which may have been 
confusing (was intended to mean nothing has changed). I've changed this 
so VARCODON is now missing (like VARAA) when it can't be translated.
PROTEINLOC is the codon number in the coding sequence. These are the cds 
regions:
There are 157 codons, position 77055 falls in the second to last codon, 
so 156.
Val