Skip to content

[Bioc-devel] Phase 3 1000 genomes reference genome

2 messages · Murphy, Alan E, Hervé Pagès

#
Hi all,

I am looking for the 1000genomes Phase3 Reference Genome Sequence (equivalent to the Phase 2 version would be useful: BSgenome.Hsapiens.1000genomes.hs37d5 https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.1000genomes.hs37d5.html). The dataset I'm looking for is also found here for download: https://ctg.cncr.nl/software/MAGMA/ref_data/g1000_eur.zip

Is this available in Bioconductor? I want to use it in a package I'm developing. I know I could download it through the package when needed or store the dataset in as package data but I know neither of these solutions are not good practice for Bioconductor submission.

Kind regards,
Alan.

Alan Murphy
Bioinformatician
Neurogenomics lab
UK Dementia Research Institute
Imperial College London
4 days later
#
Hi Alan,
On 4/16/21 4:28 AM, Murphy, Alan E wrote:
I don't think we have that:

   library(BSgenome)
   grep("1000", available.genomes(), value=TRUE)
   # [1] "BSgenome.Hsapiens.1000genomes.hs37d5"

Note that BSgenome.Hsapiens.1000genomes.hs37d5 is a contributed package 
(by Julian Gehring). You're welcome to contribute a BSgenome data 
package for the 1000genomes Phase3 Reference Genome if you'd like.

Best,
H.