Skip to content
Back to formatted view

Raw Message

Message-ID: <ed14735e-3b3b-3ef1-5ec0-bcb51a0e39b9@fredhutch.org>
Date: 2020-08-13T23:08:13Z
From: Hervé Pagès
Subject: [Bioc-devel] BSgenome changes
In-Reply-To: <CAE5ko2RO7ooSRjMbzPd8zOsQhYKcrG5V+95OoEKtTsjsFcnSyw@mail.gmail.com>

Hi Leonard,

On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
> Dear Bioc team,
> 
> I'm following up on this recent GitHub issue
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e= >. Please see the issue for more
> details and code examples.
> 
> It looks like changes in Bioc devel result in two copies of the
> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM
> like in previous package versions (length 16571) and one named chrMT
> (length 16569).
> 
> When using seqlevelsStyle() to change chromosome names from UCSC to NCBI
> format, this results in new behavior -- in the past chrM was simply renamed
> MT, now the different sequence chrMT is used. Is this intended?

Absolutely intended.

There is a long story behind the unfortunate fate of the mitochondrial 
chromosome in hg19. I'll try to keep it short.

When the UCSC folks released the hg19 browser more than 10 years ago, 
they based it on assembly GRCh37:

   https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13

See sequence report for GRCh37:

 
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.13_GRCh37/GCF_000001405.13_GRCh37_assembly_report.txt

For some mysterious reason GRCh37 didn't include the mitochondrial 
chromosome so the UCSC folks decided to use mitochondrial sequence 
NC_001807 and called it chrM.

However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of 
GRCh37. A rather surprising move after many years of hg19 being based on 
the latter.

   https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/

See sequence report for GRCh37.p13:

 
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_assembly_report.txt

Note that GRCh37.p13 does include the mitochondrial chromosome. It's 
called MT in the official sequence report above and chrMT in hg19.

At the same time the UCSC folks decided to keep chrM so now hg19 
contains 2 mitochondrial sequences: chrM and chrMT. Previously it has 
only one: chrM.

So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with
seqlevelsStyle(genome) is only reflecting this. In particular 
seqlevelsStyle(genome) <- "NCBI" now does the following:

   - Rename chrMT -> MT.

   - chrM does NOT get renamed. There is no point in renaming this 
sequence because it has no equivalent in GRCh37.p13.

Hope this helps,

H.

> 
> Leonard
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
> 

-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319