[Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation
Related to this: I have added getters for seqinfo (and friends) for the OrganismDb objects. I have not added the setters yet though since that requires some refactoring of what an OrganismDb object actually is internally. But I intend to do this also. Marc
On 04/25/2013 09:32 AM, Valerie Obenchain wrote:
Hi Vince, Kasper,
cc'ing Herve and Marc.
I think we have a couple of things going on so I wanted to clarify.
The 'genome' argument to readVcf() is assigned to the GRanges in
rowData with the "genome<-" setter. This is where .normargGenome()
gets called.
setReplaceMethod("genome", "Seqinfo",
function(x, value)
{
x at genome <- .normargGenome(value, seqnames(x))
x
}
)
If the 'genome' replacement value is named, the name(s) must match the
seqnames, not the build. So we aren't talking about matching
compatible builds,
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, c("b37"="hg19")) ## this is wrong
vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong
Instead the name must be the seqname, the value is the build,
vcf <- readVcf(fl, c("20"="hg19")) ## correct
vcf <- readVcf(fl, "hg19") ## also correct
This requirement for 'genome' is not well documented on ?readVcf or
?Seqinfo. We can fix that.
The second thing is the issue of a flexible mapping between seqinfo
metadata for different institutions. Herve and Marc have worked on
this in AnnotationDbi. They can explain more about the 'SeqnameStyle'
and how it might be used more widely.
Val
On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
An "official" comment on this http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19 with some more info in this discussion https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#!topic/genome/hFp-dGG9gBs Essentially it seems the b37 has been "patched" and this patched release is not reflected in hg19 but may be (I don't know) reflected in the b37 download from NCBI Kasper On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen < kasperdanielhansen at gmail.com> wrote:
I agree with Vincent. I have seen code from Herve in a package with some standardization of chromosome names, and this code could perhaps be used more widely so we don't have all the problems with chr1 vs chr01 vs 1. However, in this particular case, if Ulrich is actually interested in the mitochondrial genome, he has a problem. hg19, which is the genome version from UCSC is consider equal to NCBIs b37. However, as far as I understand, UCSC screwed up with the mitochondrial genome and used an old version for their hg19. So the error message is in many ways right here: the two genomes are slightly different because they have different mitochondrial genomes. Kasper
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel