Hi Richard, Thanks for identifying the bugs and for providing solutions. Parsing empty INFO or GENO : For this bug I put a check in the unpackVcf x="list", hdr="character" method, which is a couple of levels up from what you suggest below. The reason is that we want 'info' and 'geno' arguments to serve as an on/off switch for parsing the fields. By checking lengths at the entry point, we set can info=FALSE and/or geno=FALSE and the fields will not be parsed. Fixed in Rsamtools 1.7.9. Dropping ranges at 'end' : This problem appears to be in the indexTabix() function. I'm working on it now and will come back here when it's fixed. Valerie
On 12/13/2011 04:22 AM, Richard Pearson wrote:
Hi
I've spotted a problem with the current devel versions of
VariantAnnotation/Rsamtools when trying to read in variants from a
region which has no variants. Section 2 of the current
VariantAnnotation devel vignette gives an almost reproducible example
of this. I say almost as I think the full set of commands (note
inclusion of indexTabix command) should be:
library(VariantAnnotation)
vcffile <- system.file("extdata", "chr22.vcf.gz",
package="VariantAnnotation")
idx <- indexTabix(vcffile, "vcf")
rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, end=10000))
subst <- readVcf(vcffile, "hg19", param=rngs)
When I run the above I get the following error:
Error in strsplit(unlist(recs), "=", fixed = TRUE) :
non-character argument
The first variant in chr22.vcf.gz is at position 51151293, and hence
there are no variants in the range 1000-10000.
The following patch to Rsamtools gets the above to work for me, in
what I think is a sensible way:
Index: Rsamtools/R/scanVcf.R
===================================================================
--- Rsamtools/R/scanVcf.R (revision 61294)
+++ Rsamtools/R/scanVcf.R (working copy)
@@ -248,7 +248,9 @@
{
if (!is.logical(info))
x <- lapply(x, function(elt, id, n, type) {
- elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n, type)
+ if(length(elt[["INFO"]]) > 0) {
+ elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n,
type)
+ }
elt
}, rownames(info), info$Number, info$Type)
if (!is.logical(geno))
I've also realised that the above commands will only return up until
the position a base before the end of the IRanges. Is this as
expected? E.g.
rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000,
end=51151350))
subst <- readVcf(vcffile, "hg19", param=rngs) ranges(rowData(subst))
IRanges of length 1
start end width names
[1] 51151293 51151293 1 rs189846332
rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000,
end=51151351))
subst <- readVcf(vcffile, "hg19", param=rngs) ranges(rowData(subst))
IRanges of length 2
start end width names
[1] 51151293 51151293 1 rs189846332
[2] 51151350 51151350 1 rs6009951
Best wishes
Richard