An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140618/e3ec243f/attachment.pl>
[Bioc-devel] seqnames of SNPlocs.*
5 messages · Vincent Carey, Peter Hickey, Hervé Pagès
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140617/769af162/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140618/c853f0ac/attachment.pl>
Hi Peter, Yes, as Vince said, the chromosome names are those used by dbSNP. For whatever reason, dbSNP, which is part of NCBI, felt the need to use a different naming convention than the rest of NCBI :-/
On 06/17/2014 07:57 PM, Peter Hickey wrote:
Thanks for the explanation, Vincent. GenomeInfoDb has NCBI and UCSC support, but doesn't seem to support the dbSNP format. Perhaps this should be added?
The seqlevelsStyle() setter first requires that the seqlevels() setter
works on a SNPlocs object, which itself requires that the seqinfo()
setter works. Unfortunately, it doesn't at the moment:
> library(SNPlocs.Hsapiens.dbSNP.20120608)
> snps <- SNPlocs.Hsapiens.dbSNP.20120608
> seqlevels(snps) <- sub("^ch", "chr", seqlevels(snps))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ?seqinfo<-? for
signature ?"SNPlocs"?
Something I'm adding on my list.
In the mean time you can do the renaming on the GRanges objects
you extract with 'getSNPlocs(..., as.GRanges=TRUE)' or with
'rsidsToGRanges(...)'. Maybe it's not very convenient to have to do
this each time you extract snps in a GRanges object but OTOH it's
really easy those days now that we have seqlevelsStyle().
Hope this helps.
Cheers,
H.
seqlevelsStyle(seqnames(SNPlocs.Hsapiens.dbSNP.20120608))
Error in .guessSpeciesStyle(seqnames) : The style does not have a compatible entry for the species supported by Seqname. Please see genomeStyles() for supported species/style On 18/06/2014, at 12:40 PM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
it is the convention used in dbSNP, just propagated directly. indeed one typically has to relabel, but there is seqnamesStyle infrastructure in GenomeInfoDb that may help. On Tue, Jun 17, 2014 at 8:17 PM, Peter Hickey <hickey at wehi.edu.au> wrote: Is there a reason why the seqnames of SNPlocs.Hsapiens.dbSNP.20120608 (and possibly the other SNPlocs.*) use the prefix "ch" instead of "chr"? E.g. "ch1" instead of "chr1". It doesn't seem to fit with any standard way of naming chromosomes and means that these need to be renamed to use with most other Bioconductor data sources. Thanks, Pete -------------------------------- Peter Hickey, PhD Student/Research Assistant, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Ph: +613 9345 2324 hickey at wehi.edu.au http://www.wehi.edu.au
______________________________________________________________________
The information in this email is confidential and inte...{{dropped:28}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hi Peter,
Just added support for "dbSNP" seqlevels style for Human (in
GenomeInfoDb 1.1.9, will become available tomorrow):
library(SNPlocs.Hsapiens.dbSNP.20120608)
myrsids <- c("rs2639606", "rs75264089", "rs73396229", "rs55871206",
"rs10932221", "rs56219727", "rs73709730", "rs55838886",
"rs3734153", "rs79381275", "rs1516535")
gr <- rsidsToGRanges(myrsids)
Then:
> seqnames(gr)
factor-Rle of length 11 with 11 runs
Lengths: 1 1 1 1 1 1 1 1 1 1 1
Values : ch9 ch6 ch11 ch13 ch2 ch4 ch7 ch2 ch5 ch11 ch4
Levels(25): ch1 ch2 ch3 ch4 ch5 ch6 ch7 ... ch19 ch20 ch21 ch22 chX
chY chMT
> seqlevelsStyle(gr)
[1] "dbSNP"
> seqlevelsStyle(gr) <- "NCBI"
> seqnames(gr)
factor-Rle of length 11 with 11 runs
Lengths: 1 1 1 1 1 1 1 1 1 1 1
Values : 9 6 11 13 2 4 7 2 5 11 4
Levels(25): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
X Y MT
> seqlevelsStyle(gr) <- "UCSC"
> seqnames(gr)
factor-Rle of length 11 with 11 runs
Lengths: 1 1 1 1 1 1 1 1 1
1 1
Values : chr9 chr6 chr11 chr13 chr2 chr4 chr7 chr2 chr5
chr11 chr4
Levels(25): chr1 chr2 chr3 chr4 chr5 chr6 ... chr20 chr21 chr22 chrX
chrY chrM
Make the seqlevelsStyle() setter work directly on the
SNPlocs.Hsapiens.dbSNP.20120608 object itself will take more time
though. It'll actually be part of some more important SNPlocs
refactoring plans I've had on my list for a while now. Won't happen
before a couple of months.
Cheers,
H.
On 06/17/2014 10:37 PM, Herv? Pag?s wrote:
Hi Peter, Yes, as Vince said, the chromosome names are those used by dbSNP. For whatever reason, dbSNP, which is part of NCBI, felt the need to use a different naming convention than the rest of NCBI :-/ On 06/17/2014 07:57 PM, Peter Hickey wrote:
Thanks for the explanation, Vincent. GenomeInfoDb has NCBI and UCSC support, but doesn't seem to support the dbSNP format. Perhaps this should be added?
The seqlevelsStyle() setter first requires that the seqlevels() setter works on a SNPlocs object, which itself requires that the seqinfo() setter works. Unfortunately, it doesn't at the moment:
> library(SNPlocs.Hsapiens.dbSNP.20120608)
> snps <- SNPlocs.Hsapiens.dbSNP.20120608
> seqlevels(snps) <- sub("^ch", "chr", seqlevels(snps))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ?seqinfo<-? for
signature ?"SNPlocs"?
Something I'm adding on my list.
In the mean time you can do the renaming on the GRanges objects
you extract with 'getSNPlocs(..., as.GRanges=TRUE)' or with
'rsidsToGRanges(...)'. Maybe it's not very convenient to have to do
this each time you extract snps in a GRanges object but OTOH it's
really easy those days now that we have seqlevelsStyle().
Hope this helps.
Cheers,
H.
seqlevelsStyle(seqnames(SNPlocs.Hsapiens.dbSNP.20120608))
Error in .guessSpeciesStyle(seqnames) : The style does not have a compatible entry for the species supported by Seqname. Please see genomeStyles() for supported species/style On 18/06/2014, at 12:40 PM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
it is the convention used in dbSNP, just propagated directly. indeed one typically has to relabel, but there is seqnamesStyle infrastructure in GenomeInfoDb that may help. On Tue, Jun 17, 2014 at 8:17 PM, Peter Hickey <hickey at wehi.edu.au> wrote: Is there a reason why the seqnames of SNPlocs.Hsapiens.dbSNP.20120608 (and possibly the other SNPlocs.*) use the prefix "ch" instead of "chr"? E.g. "ch1" instead of "chr1". It doesn't seem to fit with any standard way of naming chromosomes and means that these need to be renamed to use with most other Bioconductor data sources. Thanks, Pete -------------------------------- Peter Hickey, PhD Student/Research Assistant, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Ph: +613 9345 2324 hickey at wehi.edu.au http://www.wehi.edu.au
______________________________________________________________________
The information in this email is confidential and inte...{{dropped:28}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319