Skip to content
Prev 7649 / 21312 Next

[Bioc-devel] VRanges-class positive strandness and locateVariants() strandawareness

Of course, the inclusion of strand would imply an interpretation of the 
variant and its strand (e.g., "-") with respect to an annotated feature. 
I can see a practical problem of integrity of the information on a 
VRanges object, by which a mandatory column, such as strand, depends on 
a non-mandatory column, such as some feature annotation stored as a 
metadata column.

A solution would be to add the transcript identifier (TXID) as mandatory 
column on the VRanges object but I suspect this is a big change to do, 
so adding a LOCSTRAND column (next to LOCSTART and LOCEND generated by 
locateVariants) in the metadata columns of the VRanges object would 
allow me to use a VRanges object as a container of variant x allele x 
sample x annotation.

Just to clear up the issue of merging strand and variant: a noisy 
variant (a variant that is not silent) and has a, e.g., loss-of-function 
effect such as the gain of a stop codon, is usually interpreted in the 
strand of the transcript and coding sequence in which the stop codon is 
gained, saying something like and A changed to a T producting the stop 
codon TAA. Ref and alt alleles are called in the strand of the reference 
chromosome, so if the transcript was annotated in the negative strand, 
we would know that we need to reverse-complement ref and alt to 
interpret the variant, although I see no need to do anything on the 
VRanges object to ref and alt because we know they are always in the 
strand of the reference chromosome. Only if you want to detect this 
stop-gain event (with predictCoding) then you would have to 
reverse-complement the ref and alt alleles. Conversely, if the variant 
falls in an intergenic region, then obviously the strand plays no role 
in the interpretation of the variant and nothing needs to be done when 
interpreting the ref and alt alleles.
On 6/11/15 5:47 PM, Michael Lawrence wrote: