Skip to content
Prev 17038 / 21318 Next

[Bioc-devel] VariantAnnotation::readVcf() sets the wrong seqlevelsStyle in devel

Hi Robert,

The VCF file uses "22" for the chromosome name which is the name used by 
NCBI. So explicitly specifying "hg19" in the readVcf() call is like 
saying that this chromosome name is a UCSC name which is why 
seqlevelsStyle() gets confused later.

If you specify the name of the NCBI assembly, things work as expected:

   fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
   vcf <- readVcf(fl, "GRCh37")
   seqlevels(vcf)
   # [1] "22"
   seqlevelsStyle(vcf)
   # [1] "NCBI"
   seqlevelsStyle(vcf) <- "UCSC"
   seqlevels(vcf)
   # [1] "chr22"

Or, if you don't know what reference genome the file is based on, don't 
specify it:

   fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
   vcf <- readVcf(fl)
   seqlevels(vcf)
   # [1] "22"
   seqlevelsStyle(vcf)
   # [1] "NCBI"    "Ensembl"
   seqlevelsStyle(vcf) <- "UCSC"
   seqlevels(vcf)
   # [1] "chr22"

or specify it later:

   genome(vcf) <- "hg19"
   seqinfo(vcf)
   # Seqinfo object with 1 sequence from hg19 genome; no seqlengths:
   #   seqnames seqlengths isCircular genome
   #   chr22            NA         NA   hg19

Hope this helps,
H.
On 7/29/20 08:30, Robert Castelo wrote: