Skip to content
Prev 7221 / 21312 Next

[Bioc-devel] request to create BSgenome Bos_taurus_UMD3.1.1

Hi Byungkuk,
On 03/25/2015 10:17 PM, Byungkuk Min wrote:
Do you really need *that* particular assembly (GCF_000003055.6).
Otherwise, there are already some bovine BSgenome packages available:

   > library(BSgenome)
   > grep("Btaurus", available.genomes(), value=TRUE)
   [1] "BSgenome.Btaurus.UCSC.bosTau3"
   [2] "BSgenome.Btaurus.UCSC.bosTau3.masked"
   [3] "BSgenome.Btaurus.UCSC.bosTau4"
   [4] "BSgenome.Btaurus.UCSC.bosTau4.masked"
   [5] "BSgenome.Btaurus.UCSC.bosTau6"
   [6] "BSgenome.Btaurus.UCSC.bosTau6.masked"

We don't have bosTau8 yet, which is the latest bovine assembly available
at UCSC (they added it in June 2014) but I could add it. Note that
despite its name (also Bos_taurus_UMD_3.1.1), bosTau8 is not the same
assembly as the one you picked up on NCBI. Yours is:

   http://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.6

(the latest, from 2014/11/25), but bosTau8 is:

   http://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.5

(much older, from 2009/12/01)

Anyway, if we ignore chrM and the thousands of scaffolds that are
included in these assemblies, the sequences of the "main" chromosomes
(i.e. chr1 to chr29 + chrX) are exactly the same in the 2 assemblies.

So maybe the BSgenome package for bosTau8 will do?

Note that the main advantage of making a BSgenome package for a
UCSC assembly (instead of using an NCBI assembly) is that the BSgenome
object then interoperates nicely with the TxDb object that one can
easily make from one of the UCSC tracks available for that assembly
(using the makeTxDbFromUCSC() function from the GenomicFeatures
package).

H.