I don't see how this can be fixed. The two data structures are
semantically incompatible; they encode different types of information,
so information is lost in both directions. Even if we collapsed the
alts, there is no way (as far as I know) to say that data for one
individual + alt combination is absent. We could put NA (".") for every
value concerning that alt, but it seems too big of an assumption to say
that all(is.na <http://is.na>())) implies omission of the VRanges
element. In other words, VCF is rectangular and VRanges is ragged, and
there is no established way to encode the raggedness in the VCF.
On Mon, Dec 8, 2014 at 11:27 AM, Valerie Obenchain
<vobencha at fredhutch.org <mailto:vobencha at fredhutch.org>> wrote:
This could be fixed in the VRanges -> VCF coercion or in VCF -> VRanges.
Currently VRanges -> VCF creates a VCF with >1 row per position (ie,
does not collapse ALT values). I'm not sure this is technically
valid as per the specs, however, it may have been by design to meet
another need. If we are ok with >1 row per position the change can
be made in VCF -> VRanges.
Opinions?
Valerie
On 12/05/2014 01:18 AM, Julian Gehring wrote:
Hi,
Assume that we have two variants from two samples at the same locus,
stored in a 'VRanges' or 'VCF' object:
library(VariantAnnotation)
vr = VRanges("1", IRanges(c(10, 10), width = 1),
ref = c("C", "C"), alt = c("A", "G"),
sampleNames = c("S1", "S2"))
vcf = as(vr, "VCF")
If we convert the VCF to a VRanges, we now get each variant in each
patient:
vr2 = as(vcf, "VRanges")
length(vr) ## 2
length(vr2) ## 4
It seems that the VCF object does not store the information of the
'sampleNames' in the first conversion.
Best wishes
Julian