Dear Bioc team, I'm following up on this recent GitHub issue <https://github.com/ldg21/SGSeq/issues/5>. Please see the issue for more details and code examples. It looks like changes in Bioc devel result in two copies of the mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM like in previous package versions (length 16571) and one named chrMT (length 16569). When using seqlevelsStyle() to change chromosome names from UCSC to NCBI format, this results in new behavior -- in the past chrM was simply renamed MT, now the different sequence chrMT is used. Is this intended? Leonard
[Bioc-devel] BSgenome changes
9 messages · Leonard Goldstein, Felix Ernst, Kasper Daniel Hansen +1 more
1 day later
Hi Leonard,
On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
Dear Bioc team, I'm following up on this recent GitHub issue <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e= >. Please see the issue for more details and code examples. It looks like changes in Bioc devel result in two copies of the mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM like in previous package versions (length 16571) and one named chrMT (length 16569). When using seqlevelsStyle() to change chromosome names from UCSC to NCBI format, this results in new behavior -- in the past chrM was simply renamed MT, now the different sequence chrMT is used. Is this intended?
Absolutely intended. There is a long story behind the unfortunate fate of the mitochondrial chromosome in hg19. I'll try to keep it short. When the UCSC folks released the hg19 browser more than 10 years ago, they based it on assembly GRCh37: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13 See sequence report for GRCh37: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.13_GRCh37/GCF_000001405.13_GRCh37_assembly_report.txt For some mysterious reason GRCh37 didn't include the mitochondrial chromosome so the UCSC folks decided to use mitochondrial sequence NC_001807 and called it chrM. However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of GRCh37. A rather surprising move after many years of hg19 being based on the latter. https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/ See sequence report for GRCh37.p13: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_assembly_report.txt Note that GRCh37.p13 does include the mitochondrial chromosome. It's called MT in the official sequence report above and chrMT in hg19. At the same time the UCSC folks decided to keep chrM so now hg19 contains 2 mitochondrial sequences: chrM and chrMT. Previously it has only one: chrM. So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with seqlevelsStyle(genome) is only reflecting this. In particular seqlevelsStyle(genome) <- "NCBI" now does the following: - Rename chrMT -> MT. - chrM does NOT get renamed. There is no point in renaming this sequence because it has no equivalent in GRCh37.p13. Hope this helps, H.
Leonard [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hi Leonard, Hi Herve, I followed your conversation, since I have noticed the same problem. Thanks, Herve, for the explanation of the recent changes on hg19. The GRCh37.P13 report states in its last line: MT assembled-molecule MT Mitochondrion J01415.2 = NC_012920.1 non-nuclear 16569 chrM Since the last name is called "UCSC-style-name", wouldn't that mean that chrM has to be renamed to MT and not chrMT? Thanks again for the explanation. Cheers, Felix -----Urspr?ngliche Nachricht----- Von: Bioc-devel <bioc-devel-bounces at r-project.org> Im Auftrag von Herv? Pag?s Gesendet: Freitag, 14. August 2020 01:08 An: Leonard Goldstein <goldstein.leonard at gene.com>; bioc-devel at r-project.org Cc: charlotte.soneson at fmi.ch Betreff: Re: [Bioc-devel] BSgenome changes Hi Leonard,
On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
Dear Bioc team, I'm following up on this recent GitHub issue <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21 _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e= >. Please see the issue for more details and code examples. It looks like changes in Bioc devel result in two copies of the mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM like in previous package versions (length 16571) and one named chrMT (length 16569). When using seqlevelsStyle() to change chromosome names from UCSC to NCBI format, this results in new behavior -- in the past chrM was simply renamed MT, now the different sequence chrMT is used. Is this intended?
Absolutely intended. There is a long story behind the unfortunate fate of the mitochondrial chromosome in hg19. I'll try to keep it short. When the UCSC folks released the hg19 browser more than 10 years ago, they based it on assembly GRCh37: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13 See sequence report for GRCh37: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.13_GRCh37/GCF_000001405.13_GRCh37_assembly_report.txt For some mysterious reason GRCh37 didn't include the mitochondrial chromosome so the UCSC folks decided to use mitochondrial sequence NC_001807 and called it chrM. However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of GRCh37. A rather surprising move after many years of hg19 being based on the latter. https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/ See sequence report for GRCh37.p13: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_assembly_report.txt Note that GRCh37.p13 does include the mitochondrial chromosome. It's called MT in the official sequence report above and chrMT in hg19. At the same time the UCSC folks decided to keep chrM so now hg19 contains 2 mitochondrial sequences: chrM and chrMT. Previously it has only one: chrM. So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with seqlevelsStyle(genome) is only reflecting this. In particular seqlevelsStyle(genome) <- "NCBI" now does the following: - Rename chrMT -> MT. - chrM does NOT get renamed. There is no point in renaming this sequence because it has no equivalent in GRCh37.p13. Hope this helps, H.
Leonard [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Felix,
On 8/13/20 21:43, Felix Ernst wrote:
Hi Leonard, Hi Herve, I followed your conversation, since I have noticed the same problem. Thanks, Herve, for the explanation of the recent changes on hg19. The GRCh37.P13 report states in its last line: MT assembled-molecule MT Mitochondrion J01415.2 = NC_012920.1 non-nuclear 16569 chrM Since the last name is called "UCSC-style-name", wouldn't that mean that chrM has to be renamed to MT and not chrMT?
This is a mistake in the sequence report for GRCh37.p13. GRCh37.p13:MT is the same as hg19:chrMT, not hg19:chrM. hg19:chrM and hg19:chrMT are **not** the same sequences. The former is NC_001807 and has length 16571 and the latter is NC_012920.1 and has length 16569. Yes, seqlevelsStyle() is sorting out all this mess for you ;-) Cheers, H.
Thanks again for the explanation. Cheers, Felix -----Urspr?ngliche Nachricht----- Von: Bioc-devel <bioc-devel-bounces at r-project.org> Im Auftrag von Herv? Pag?s Gesendet: Freitag, 14. August 2020 01:08 An: Leonard Goldstein <goldstein.leonard at gene.com>; bioc-devel at r-project.org Cc: charlotte.soneson at fmi.ch Betreff: Re: [Bioc-devel] BSgenome changes Hi Leonard, On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
Dear Bioc team, I'm following up on this recent GitHub issue <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21 _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e= >. Please see the issue for more details and code examples. It looks like changes in Bioc devel result in two copies of the mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM like in previous package versions (length 16571) and one named chrMT (length 16569). When using seqlevelsStyle() to change chromosome names from UCSC to NCBI format, this results in new behavior -- in the past chrM was simply renamed MT, now the different sequence chrMT is used. Is this intended?
Absolutely intended.
There is a long story behind the unfortunate fate of the mitochondrial chromosome in hg19. I'll try to keep it short.
When the UCSC folks released the hg19 browser more than 10 years ago, they based it on assembly GRCh37:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
See sequence report for GRCh37:
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
For some mysterious reason GRCh37 didn't include the mitochondrial chromosome so the UCSC folks decided to use mitochondrial sequence
NC_001807 and called it chrM.
However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of GRCh37. A rather surprising move after many years of hg19 being based on the latter.
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
See sequence report for GRCh37.p13:
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
Note that GRCh37.p13 does include the mitochondrial chromosome. It's called MT in the official sequence report above and chrMT in hg19.
At the same time the UCSC folks decided to keep chrM so now hg19 contains 2 mitochondrial sequences: chrM and chrMT. Previously it has only one: chrM.
So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with
seqlevelsStyle(genome) is only reflecting this. In particular
seqlevelsStyle(genome) <- "NCBI" now does the following:
- Rename chrMT -> MT.
- chrM does NOT get renamed. There is no point in renaming this sequence because it has no equivalent in GRCh37.p13.
Hope this helps,
H.
Leonard [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
3 days later
In light of this, could we get a version of GRCh37 with only a single mitochondrial genome?
On Fri, Aug 14, 2020 at 6:17 PM Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi Felix, On 8/13/20 21:43, Felix Ernst wrote:
Hi Leonard, Hi Herve, I followed your conversation, since I have noticed the same problem.
Thanks, Herve, for the explanation of the recent changes on hg19.
The GRCh37.P13 report states in its last line: MT assembled-molecule MT Mitochondrion J01415.2 =
NC_012920.1 non-nuclear 16569 chrM
Since the last name is called "UCSC-style-name", wouldn't that mean that
chrM has to be renamed to MT and not chrMT? This is a mistake in the sequence report for GRCh37.p13. GRCh37.p13:MT is the same as hg19:chrMT, not hg19:chrM. hg19:chrM and hg19:chrMT are **not** the same sequences. The former is NC_001807 and has length 16571 and the latter is NC_012920.1 and has length 16569. Yes, seqlevelsStyle() is sorting out all this mess for you ;-) Cheers, H.
Thanks again for the explanation. Cheers, Felix -----Urspr?ngliche Nachricht----- Von: Bioc-devel <bioc-devel-bounces at r-project.org> Im Auftrag von Herv?
Pag?s
Gesendet: Freitag, 14. August 2020 01:08 An: Leonard Goldstein <goldstein.leonard at gene.com>;
bioc-devel at r-project.org
Cc: charlotte.soneson at fmi.ch Betreff: Re: [Bioc-devel] BSgenome changes Hi Leonard, On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
Dear Bioc team, I'm following up on this recent GitHub issue <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
. Please see the issue for more details and code examples.
It looks like changes in Bioc devel result in two copies of the mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM like in previous package versions (length 16571) and one named chrMT (length 16569). When using seqlevelsStyle() to change chromosome names from UCSC to NCBI format, this results in new behavior -- in the past chrM was simply renamed MT, now the different sequence chrMT is used. Is this
intended?
Absolutely intended. There is a long story behind the unfortunate fate of the mitochondrial
chromosome in hg19. I'll try to keep it short.
When the UCSC folks released the hg19 browser more than 10 years ago,
they based it on assembly GRCh37:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
See sequence report for GRCh37:
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
For some mysterious reason GRCh37 didn't include the mitochondrial
chromosome so the UCSC folks decided to use mitochondrial sequence
NC_001807 and called it chrM. However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of
GRCh37. A rather surprising move after many years of hg19 being based on the latter.
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
See sequence report for GRCh37.p13:
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
Note that GRCh37.p13 does include the mitochondrial chromosome. It's
called MT in the official sequence report above and chrMT in hg19.
At the same time the UCSC folks decided to keep chrM so now hg19
contains 2 mitochondrial sequences: chrM and chrMT. Previously it has only one: chrM.
So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with
seqlevelsStyle(genome) is only reflecting this. In particular
seqlevelsStyle(genome) <- "NCBI" now does the following:
- Rename chrMT -> MT.
- chrM does NOT get renamed. There is no point in renaming this
sequence because it has no equivalent in GRCh37.p13.
Hope this helps, H.
Leonard
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Best, Kasper [[alternative HTML version deleted]]
On 8/18/20 01:40, Kasper Daniel Hansen wrote:
In light of this, could we get a version of GRCh37 with only a single mitochondrial genome?
You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without the hg19:chrM sequence? Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you stay away from this mess. I'm not sure that adding yet another BSgenome package would make the situation less confusing.
On Fri, Aug 14, 2020 at 6:17 PM Herv? Pag?s <hpages at fredhutch.org
<mailto:hpages at fredhutch.org>> wrote:
Hi Felix,
On 8/13/20 21:43, Felix Ernst wrote:
> Hi Leonard, Hi Herve,
>
> I followed your conversation, since I have noticed the same
problem. Thanks, Herve, for the explanation of the recent changes on
hg19.
>
> The GRCh37.P13 report states in its last line:
>
> MT? ? assembled-molecule? ? ? MT? ? ? Mitochondrion? ?J01415.2
? ? =? ? ? ?NC_012920.1? ? ?non-nuclear? ? ?16569? ?chrM
>
> Since the last name is called "UCSC-style-name", wouldn't that
mean that chrM has to be renamed to MT and not chrMT?
This is a mistake in the sequence report for GRCh37.p13. GRCh37.p13:MT
is the same as hg19:chrMT, not hg19:chrM.
hg19:chrM and hg19:chrMT are **not** the same sequences. The former is
NC_001807 and has length 16571 and the latter is NC_012920.1 and has
length 16569.
Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
Cheers,
H.
>
> Thanks again for the explanation.
>
> Cheers,
> Felix
>
> -----Urspr?ngliche Nachricht-----
> Von: Bioc-devel <bioc-devel-bounces at r-project.org
<mailto:bioc-devel-bounces at r-project.org>> Im Auftrag von Herv? Pag?s
> Gesendet: Freitag, 14. August 2020 01:08
> An: Leonard Goldstein <goldstein.leonard at gene.com
<mailto:goldstein.leonard at gene.com>>; bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
> Cc: charlotte.soneson at fmi.ch <mailto:charlotte.soneson at fmi.ch>
> Betreff: Re: [Bioc-devel] BSgenome changes
>
> Hi Leonard,
>
> On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
>> Dear Bioc team,
>>
>> I'm following up on this recent GitHub issue
>>
>>
_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
>. Please see the issue for more details and code examples.
>>
>> It looks like changes in Bioc devel result in two copies of the
>> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one
named
>> chrM like in previous package versions (length 16571) and one named
>> chrMT (length 16569).
>>
>> When using seqlevelsStyle() to change chromosome names from UCSC to
>> NCBI format, this results in new behavior -- in the past chrM was
>> simply renamed MT, now the different sequence chrMT is used. Is
this intended?
>
> Absolutely intended.
>
> There is a long story behind the unfortunate fate of the
mitochondrial chromosome in hg19. I'll try to keep it short.
>
> When the UCSC folks released the hg19 browser more than 10 years
ago, they based it on assembly GRCh37:
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
>
> See sequence report for GRCh37:
>
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
>
> For some mysterious reason GRCh37 didn't include the
mitochondrial chromosome so the UCSC folks decided to use
mitochondrial sequence
> NC_001807 and called it chrM.
>
> However, UCSC has recently decided to base hg19 on GRCh37.p13
instead of GRCh37. A rather surprising move after many years of hg19
being based on the latter.
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
>
> See sequence report for GRCh37.p13:
>
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
>
> Note that GRCh37.p13 does include the mitochondrial chromosome.
It's called MT in the official sequence report above and chrMT in hg19.
>
> At the same time the UCSC folks decided to keep chrM so now hg19
contains 2 mitochondrial sequences: chrM and chrMT. Previously it
has only one: chrM.
>
> So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with
> seqlevelsStyle(genome) is only reflecting this. In particular
> seqlevelsStyle(genome) <- "NCBI" now does the following:
>
>? ? ?- Rename chrMT -> MT.
>
>? ? ?- chrM does NOT get renamed. There is no point in renaming
this sequence because it has no equivalent in GRCh37.p13.
>
> Hope this helps,
>
> H.
>
>>
>> Leonard
>>
>>? ? ? [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
>>
>>
man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
>>
vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
>> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
>>
>
> --
> Herv? Pag?s
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
> Phone:? (206) 667-5791
> Fax:? ? (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
Phone:? (206) 667-5791
Fax:? ? (206) 667-1319
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=>
--
Best,
Kasper
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Thanks for the explanation Herv?. Best wishes Leonard
On Tue, Aug 18, 2020 at 10:06 AM Herv? Pag?s <hpages at fredhutch.org> wrote:
On 8/18/20 01:40, Kasper Daniel Hansen wrote:
In light of this, could we get a version of GRCh37 with only a single mitochondrial genome?
You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without the hg19:chrM sequence? Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you stay away from this mess. I'm not sure that adding yet another BSgenome package would make the situation less confusing.
On Fri, Aug 14, 2020 at 6:17 PM Herv? Pag?s <hpages at fredhutch.org
<mailto:hpages at fredhutch.org>> wrote:
Hi Felix,
On 8/13/20 21:43, Felix Ernst wrote:
> Hi Leonard, Hi Herve,
>
> I followed your conversation, since I have noticed the same
problem. Thanks, Herve, for the explanation of the recent changes on
hg19.
>
> The GRCh37.P13 report states in its last line:
>
> MT assembled-molecule MT Mitochondrion J01415.2
= NC_012920.1 non-nuclear 16569 chrM
>
> Since the last name is called "UCSC-style-name", wouldn't that
mean that chrM has to be renamed to MT and not chrMT?
This is a mistake in the sequence report for GRCh37.p13.
GRCh37.p13:MT
is the same as hg19:chrMT, not hg19:chrM.
hg19:chrM and hg19:chrMT are **not** the same sequences. The former
is
NC_001807 and has length 16571 and the latter is NC_012920.1 and has
length 16569.
Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
Cheers,
H.
>
> Thanks again for the explanation.
>
> Cheers,
> Felix
>
> -----Urspr?ngliche Nachricht-----
> Von: Bioc-devel <bioc-devel-bounces at r-project.org
<mailto:bioc-devel-bounces at r-project.org>> Im Auftrag von Herv?
Pag?s
> Gesendet: Freitag, 14. August 2020 01:08
> An: Leonard Goldstein <goldstein.leonard at gene.com
<mailto:goldstein.leonard at gene.com>>; bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
> Cc: charlotte.soneson at fmi.ch <mailto:charlotte.soneson at fmi.ch>
> Betreff: Re: [Bioc-devel] BSgenome changes
>
> Hi Leonard,
>
> On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
>> Dear Bioc team,
>>
>> I'm following up on this recent GitHub issue
>>
<
>>
_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
>. Please see the issue for more details and code examples.
>>
>> It looks like changes in Bioc devel result in two copies of the
>> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one
named
>> chrM like in previous package versions (length 16571) and one
named
>> chrMT (length 16569).
>>
>> When using seqlevelsStyle() to change chromosome names from UCSC
to
>> NCBI format, this results in new behavior -- in the past chrM was
>> simply renamed MT, now the different sequence chrMT is used. Is
this intended?
>
> Absolutely intended.
>
> There is a long story behind the unfortunate fate of the
mitochondrial chromosome in hg19. I'll try to keep it short.
>
> When the UCSC folks released the hg19 browser more than 10 years
ago, they based it on assembly GRCh37:
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
>
> See sequence report for GRCh37:
>
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
>
> For some mysterious reason GRCh37 didn't include the
mitochondrial chromosome so the UCSC folks decided to use
mitochondrial sequence
> NC_001807 and called it chrM.
>
> However, UCSC has recently decided to base hg19 on GRCh37.p13
instead of GRCh37. A rather surprising move after many years of hg19
being based on the latter.
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
>
> See sequence report for GRCh37.p13:
>
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
>
> Note that GRCh37.p13 does include the mitochondrial chromosome.
It's called MT in the official sequence report above and chrMT in
hg19.
>
> At the same time the UCSC folks decided to keep chrM so now hg19
contains 2 mitochondrial sequences: chrM and chrMT. Previously it
has only one: chrM.
>
> So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and
with
> seqlevelsStyle(genome) is only reflecting this. In particular
> seqlevelsStyle(genome) <- "NCBI" now does the following:
>
> - Rename chrMT -> MT.
>
> - chrM does NOT get renamed. There is no point in renaming
this sequence because it has no equivalent in GRCh37.p13.
>
> Hope this helps,
>
> H.
>
>>
>> Leonard
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
>>
>>
man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
>>
vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
>> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
>>
>
> --
> Herv? Pag?s
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=
-- Best, Kasper
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
1 day later
Well, the presence of two mitochondrial genomes is to fix a mistake by UCSC. I can appreciate the importance of representing this mistake when you build off UCSC. But it strikes me as not actually representing the h37 version of the genome, and it seems to me that we want such a representation in the project - not everything comes through UCSC. But perhaps I have not given this sufficient thought, this is just my immediate reaction. On Tue, Aug 18, 2020 at 8:18 PM Leonard Goldstein <
goldstein.leonard at gene.com> wrote:
Thanks for the explanation Herv?. Best wishes Leonard On Tue, Aug 18, 2020 at 10:06 AM Herv? Pag?s <hpages at fredhutch.org> wrote:
On 8/18/20 01:40, Kasper Daniel Hansen wrote:
In light of this, could we get a version of GRCh37 with only a single mitochondrial genome?
You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without the hg19:chrM sequence? Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you stay away from this mess. I'm not sure that adding yet another BSgenome package would make the situation less confusing.
On Fri, Aug 14, 2020 at 6:17 PM Herv? Pag?s <hpages at fredhutch.org
<mailto:hpages at fredhutch.org>> wrote:
Hi Felix,
On 8/13/20 21:43, Felix Ernst wrote:
> Hi Leonard, Hi Herve,
>
> I followed your conversation, since I have noticed the same
problem. Thanks, Herve, for the explanation of the recent changes on
hg19.
>
> The GRCh37.P13 report states in its last line:
>
> MT assembled-molecule MT Mitochondrion J01415.2
= NC_012920.1 non-nuclear 16569 chrM
>
> Since the last name is called "UCSC-style-name", wouldn't that
mean that chrM has to be renamed to MT and not chrMT?
This is a mistake in the sequence report for GRCh37.p13.
GRCh37.p13:MT
is the same as hg19:chrMT, not hg19:chrM.
hg19:chrM and hg19:chrMT are **not** the same sequences. The former
is
NC_001807 and has length 16571 and the latter is NC_012920.1 and has
length 16569.
Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
Cheers,
H.
>
> Thanks again for the explanation.
>
> Cheers,
> Felix
>
> -----Urspr?ngliche Nachricht-----
> Von: Bioc-devel <bioc-devel-bounces at r-project.org
<mailto:bioc-devel-bounces at r-project.org>> Im Auftrag von Herv?
Pag?s
> Gesendet: Freitag, 14. August 2020 01:08
> An: Leonard Goldstein <goldstein.leonard at gene.com
<mailto:goldstein.leonard at gene.com>>; bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
> Cc: charlotte.soneson at fmi.ch <mailto:charlotte.soneson at fmi.ch>
> Betreff: Re: [Bioc-devel] BSgenome changes
>
> Hi Leonard,
>
> On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
>> Dear Bioc team,
>>
>> I'm following up on this recent GitHub issue
>>
<
>>
_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
>. Please see the issue for more details and code examples.
>>
>> It looks like changes in Bioc devel result in two copies of the
>> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one
named
>> chrM like in previous package versions (length 16571) and one
named
>> chrMT (length 16569).
>>
>> When using seqlevelsStyle() to change chromosome names from
UCSC to
>> NCBI format, this results in new behavior -- in the past chrM
was
>> simply renamed MT, now the different sequence chrMT is used. Is
this intended?
>
> Absolutely intended.
>
> There is a long story behind the unfortunate fate of the
mitochondrial chromosome in hg19. I'll try to keep it short.
>
> When the UCSC folks released the hg19 browser more than 10 years
ago, they based it on assembly GRCh37:
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
>
> See sequence report for GRCh37:
>
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
>
> For some mysterious reason GRCh37 didn't include the
mitochondrial chromosome so the UCSC folks decided to use
mitochondrial sequence
> NC_001807 and called it chrM.
>
> However, UCSC has recently decided to base hg19 on GRCh37.p13
instead of GRCh37. A rather surprising move after many years of hg19
being based on the latter.
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
>
> See sequence report for GRCh37.p13:
>
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
>
> Note that GRCh37.p13 does include the mitochondrial chromosome.
It's called MT in the official sequence report above and chrMT in
hg19.
>
> At the same time the UCSC folks decided to keep chrM so now hg19
contains 2 mitochondrial sequences: chrM and chrMT. Previously it
has only one: chrM.
>
> So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and
with
> seqlevelsStyle(genome) is only reflecting this. In particular
> seqlevelsStyle(genome) <- "NCBI" now does the following:
>
> - Rename chrMT -> MT.
>
> - chrM does NOT get renamed. There is no point in renaming
this sequence because it has no equivalent in GRCh37.p13.
>
> Hope this helps,
>
> H.
>
>>
>> Leonard
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
>>
>>
man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
>>
vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
>> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
>>
>
> --
> Herv? Pag?s
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=
-- Best, Kasper
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Best, Kasper [[alternative HTML version deleted]]
Kasper, The tradition so far has been to package all UCSC human genomes since hg17. We could also start producing BSgenome packages for other non-UCSC Human assemblies. We just need to draw a line somewhere. If there is a need for it we can make BSgenome.Hsapiens.NCBI.GRCh37.p13 available, as I said earlier. Is this what you are asking for? H.
On 8/20/20 03:23, Kasper Daniel Hansen wrote:
Well, the?presence?of two mitochondrial genomes is to fix a mistake by
UCSC. I can appreciate the importance of representing this mistake when
you build off UCSC. But it strikes me as not actually representing the
h37 version of the genome, and it seems to me that we want such a
representation in the project - not everything comes through UCSC. But
perhaps I have not given this sufficient?thought, this is just my
immediate reaction.
On Tue, Aug 18, 2020 at 8:18 PM Leonard Goldstein
<goldstein.leonard at gene.com <mailto:goldstein.leonard at gene.com>> wrote:
Thanks for the explanation Herv?.
Best wishes
Leonard
On Tue, Aug 18, 2020 at 10:06 AM Herv? Pag?s <hpages at fredhutch.org
<mailto:hpages at fredhutch.org>> wrote:
On 8/18/20 01:40, Kasper Daniel Hansen wrote:
> In light of this, could we get a version of GRCh37 with only
a single
> mitochondrial genome?
You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would
contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but
without
the hg19:chrM sequence?
Certainly doable but note that by using
BSgenome.Hsapiens.UCSC.hg38 you
stay away from this mess. I'm not sure that adding yet another
BSgenome
package would make the situation less confusing.
>
> On Fri, Aug 14, 2020 at 6:17 PM Herv? Pag?s
<hpages at fredhutch.org <mailto:hpages at fredhutch.org>
> <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>>
wrote:
>
>? ? ?Hi Felix,
>
>? ? ?On 8/13/20 21:43, Felix Ernst wrote:
>? ? ? > Hi Leonard, Hi Herve,
>? ? ? >
>? ? ? > I followed your conversation, since I have noticed the
same
>? ? ?problem. Thanks, Herve, for the explanation of the recent
changes on
>? ? ?hg19.
>? ? ? >
>? ? ? > The GRCh37.P13 report states in its last line:
>? ? ? >
>? ? ? > MT? ? assembled-molecule? ? ? MT? ? ? Mitochondrion
?J01415.2
>? ? ? ? ? =? ? ? ?NC_012920.1? ? ?non-nuclear? ? ?16569? ?chrM
>? ? ? >
>? ? ? > Since the last name is called "UCSC-style-name",
wouldn't that
>? ? ?mean that chrM has to be renamed to MT and not chrMT?
>
>? ? ?This is a mistake in the sequence report for GRCh37.p13.
GRCh37.p13:MT
>? ? ?is the same as hg19:chrMT, not hg19:chrM.
>
>? ? ?hg19:chrM and hg19:chrMT are **not** the same sequences.
The former is
>? ? ?NC_001807 and has length 16571 and the latter is
NC_012920.1 and has
>? ? ?length 16569.
>
>? ? ?Yes, seqlevelsStyle() is sorting out all this mess for
you ;-)
>
>? ? ?Cheers,
>? ? ?H.
>
>? ? ? >
>? ? ? > Thanks again for the explanation.
>? ? ? >
>? ? ? > Cheers,
>? ? ? > Felix
>? ? ? >
>? ? ? > -----Urspr?ngliche Nachricht-----
>? ? ? > Von: Bioc-devel <bioc-devel-bounces at r-project.org
<mailto:bioc-devel-bounces at r-project.org>
>? ? ?<mailto:bioc-devel-bounces at r-project.org
<mailto:bioc-devel-bounces at r-project.org>>> Im Auftrag von Herv?
Pag?s
>? ? ? > Gesendet: Freitag, 14. August 2020 01:08
>? ? ? > An: Leonard Goldstein <goldstein.leonard at gene.com
<mailto:goldstein.leonard at gene.com>
>? ? ?<mailto:goldstein.leonard at gene.com
<mailto:goldstein.leonard at gene.com>>>; bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
>? ? ?<mailto:bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>>
>? ? ? > Cc: charlotte.soneson at fmi.ch
<mailto:charlotte.soneson at fmi.ch>
<mailto:charlotte.soneson at fmi.ch <mailto:charlotte.soneson at fmi.ch>>
>? ? ? > Betreff: Re: [Bioc-devel] BSgenome changes
>? ? ? >
>? ? ? > Hi Leonard,
>? ? ? >
>? ? ? > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
>? ? ? >> Dear Bioc team,
>? ? ? >>
>? ? ? >> I'm following up on this recent GitHub issue
>? ? ? >>
>
>? ? ? >>
>
?_SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
>? ? ? >. Please see the issue for more details and code examples.
>? ? ? >>
>? ? ? >> It looks like changes in Bioc devel result in two
copies of the
>? ? ? >> mitochondrial chromosome for
BSgenome.Hsapiens.UCSC.hg19 -- one
>? ? ?named
>? ? ? >> chrM like in previous package versions (length 16571)
and one named
>? ? ? >> chrMT (length 16569).
>? ? ? >>
>? ? ? >> When using seqlevelsStyle() to change chromosome
names from UCSC to
>? ? ? >> NCBI format, this results in new behavior -- in the
past chrM was
>? ? ? >> simply renamed MT, now the different sequence chrMT
is used. Is
>? ? ?this intended?
>? ? ? >
>? ? ? > Absolutely intended.
>? ? ? >
>? ? ? > There is a long story behind the unfortunate fate of the
>? ? ?mitochondrial chromosome in hg19. I'll try to keep it short.
>? ? ? >
>? ? ? > When the UCSC folks released the hg19 browser more
than 10 years
>? ? ?ago, they based it on assembly GRCh37:
>? ? ? >
>? ? ? >
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
>? ? ? >
>? ? ? > See sequence report for GRCh37:
>? ? ? >
>? ? ? >
>? ? ? >
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
>? ? ? >
>? ? ? > For some mysterious reason GRCh37 didn't include the
>? ? ?mitochondrial chromosome so the UCSC folks decided to use
>? ? ?mitochondrial sequence
>? ? ? > NC_001807 and called it chrM.
>? ? ? >
>? ? ? > However, UCSC has recently decided to base hg19 on
GRCh37.p13
>? ? ?instead of GRCh37. A rather surprising move after many
years of hg19
>? ? ?being based on the latter.
>? ? ? >
>? ? ? >
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
>? ? ? >
>? ? ? > See sequence report for GRCh37.p13:
>? ? ? >
>? ? ? >
>? ? ? >
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
>? ? ? >
>? ? ? > Note that GRCh37.p13 does include the mitochondrial
chromosome.
>? ? ?It's called MT in the official sequence report above and
chrMT in hg19.
>? ? ? >
>? ? ? > At the same time the UCSC folks decided to keep chrM
so now hg19
>? ? ?contains 2 mitochondrial sequences: chrM and chrMT.
Previously it
>? ? ?has only one: chrM.
>? ? ? >
>? ? ? > So what you see in BioC devel in
BSgenome.Hsapiens.UCSC.hg19 and with
>? ? ? > seqlevelsStyle(genome) is only reflecting this. In
particular
>? ? ? > seqlevelsStyle(genome) <- "NCBI" now does the following:
>? ? ? >
>? ? ? >? ? ?- Rename chrMT -> MT.
>? ? ? >
>? ? ? >? ? ?- chrM does NOT get renamed. There is no point in
renaming
>? ? ?this sequence because it has no equivalent in GRCh37.p13.
>? ? ? >
>? ? ? > Hope this helps,
>? ? ? >
>? ? ? > H.
>? ? ? >
>? ? ? >>
>? ? ? >> Leonard
>? ? ? >>
>? ? ? >>? ? ? [[alternative HTML version deleted]]
>? ? ? >>
>? ? ? >> _______________________________________________
>? ? ? >> Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>>
>? ? ?mailing list
>? ? ? >>
>
>? ? ? >>
>
?man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
>? ? ? >>
>
?vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
>? ? ? >>
fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
>? ? ? >>
>? ? ? >
>? ? ? > --
>? ? ? > Herv? Pag?s
>? ? ? >
>? ? ? > Program in Computational Biology
>? ? ? > Division of Public Health Sciences
>? ? ? > Fred Hutchinson Cancer Research Center
>? ? ? > 1100 Fairview Ave. N, M1-B514
>? ? ? > P.O. Box 19024
>? ? ? > Seattle, WA 98109-1024
>? ? ? >
>? ? ? > E-mail: hpages at fredhutch.org
<mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org
<mailto:hpages at fredhutch.org>>
>? ? ? > Phone:? (206) 667-5791
>? ? ? > Fax:? ? (206) 667-1319
>? ? ? >
>? ? ? > _______________________________________________
>? ? ? > Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>>
>? ? ?mailing list
>? ? ? >
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
>? ? ? >
>
>? ? ?--
>? ? ?Herv? Pag?s
>
>? ? ?Program in Computational Biology
>? ? ?Division of Public Health Sciences
>? ? ?Fred Hutchinson Cancer Research Center
>? ? ?1100 Fairview Ave. N, M1-B514
>? ? ?P.O. Box 19024
>? ? ?Seattle, WA 98109-1024
>
>? ? ?E-mail: hpages at fredhutch.org
<mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org
<mailto:hpages at fredhutch.org>>
>? ? ?Phone:? (206) 667-5791
>? ? ?Fax:? ? (206) 667-1319
>
>? ? ?_______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>> mailing list
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ZEkK79ISNzkyVJe1VIHawt4Y06TaycYht6rtTE_1eAE&s=MPZsoxMTYGldvJB8QHrLQL-3j8-p1RCWFUZmUsfHlbk&e=>
>
?<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=>
>
>
>
> --
> Best,
> Kasper
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
Phone:? (206) 667-5791
Fax:? ? (206) 667-1319
--
Best,
Kasper
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319