Skip to content

[Bioc-devel] About Hg38 BSgenome

6 messages · rcaloger, Julian Gehring, James W. MacDonald +1 more

#
Dear Bioc Team,
I am the maintainer of chimera package.
Recently some of the users asked for the possibility to use chimera with
fusions detected on hg38 human genome.
I checked for the availability of hg38 as BSgenome but I did not find it in
Bioc repository, as instead there is TxDb.Hsapiens.UCSC.hg38.knownGene. I
would like to know if it is planned the release of hg38 as BSgenome, maybe
in the next Bioc release.
In case it is not planned could please suggest me what to read to build it?
Cheers
Raffaele
#
Hi Raffaele,
You can find it under the name 
  BSgenome.Hsapiens.NCBI.GRCh38
  http://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.NCBI.GRCh38.html (http://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.NCBI.GRCh38.html)
The naming of the chromosomes has been harmonized between UCSC and GRCh with the new release, so there should be no need for two versions at the genome level.
Best
Julian
On Tue, Dec 2, 2014 at 15:12, Raffaele Adolfo Calogero wrote:
Dear Bioc Team,
I am the maintainer of chimera package.
Recently some of the users asked for the possibility to use chimera with
fusions detected on hg38 human genome.
I checked for the availability of hg38 as BSgenome but I did not find it in
Bioc repository, as instead there is TxDb.Hsapiens.UCSC.hg38.knownGene. I
would like to know if it is planned the release of hg38 as BSgenome, maybe
in the next Bioc release.
In case it is not planned could please suggest me what to read to build it?
Cheers
Raffaele
#
Hi Raffaele,

See here:

http://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.NCBI.GRCh38.html

Best,

Jim



On Tue, Dec 2, 2014 at 9:10 AM, Raffaele Adolfo Calogero <
raffaele.calogero at unito.it> wrote:

            

  
    
#
Ignore my comment about the naming convention.
On Tue, Dec 2, 2014 at 15:45, Julian Gehring  wrote:The naming of the chromosomes has been harmonized between UCSC and GRCh with the new release, so there should be no need for two versions at the genome level.
#
Hi Raffaele,

Ignore my last post completely, it was overly optimistic:

The 'BSgenome.Hsapiens.NCBI.GRCh38' package contains the genomic
sequence that is identical between GRCh38 and hg38.  The naming of the
chromosomes is different.  For the toplevel chromosomes, the names can
be easily converted:

  library(BSgenome.Hsapiens.NCBI.GRCh38)
  library(TxDb.Hsapiens.UCSC.hg38.knownGene)

  bs = BSgenome.Hsapiens.NCBI.GRCh38
  seqlevelsStyle(bs) = "UCSC" ## convert to UCSC style

  seqlevels(BSgenome.Hsapiens.NCBI.GRCh38)

  seqlevels(bs)
  seqlevels(TxDb.Hsapiens.UCSC.hg38.knownGene) 

However, this does not work for the non-toplevel chrs, e.g.:
'HSCHR19KIR_RP5_B_HAP_CTG3_1' does not have a corresponding sequence in
the 'TxDb.Hsapiens.UCSC.hg38.knownGene' (and also won't be converted).

Best
Julian


Julian Gehring (12/02/14 15:44):
#
Hi Raffaele,

You are in luck today because while we normally do *not* have mechanisms 
to harmonize the non-standard chromosome names, for this specific case 
Herve wrote some code to handle it.  So you want to look at this:

library(GenomeInfoDb)
?fetchExtendedChromInfoFromUCSC


  Marc
On 12/02/2014 07:15 AM, Julian Gehring wrote: