Skip to content

[Bioc-devel] making txdb, and propagating metadata from AnnotationHub to GenomicFeatures

3 messages · Hervé Pagès, Arora, Sonali, Michael Love

#
Hi Michael,
On 06/17/2015 12:35 AM, Michael Love wrote:
A first (and hopefully easy) improvement would be that the GRanges and
other objects on AnnotationHub had more information in their metadata().

In the mean time, here is a workaround (in 2 steps):

1) Add some useful metadata to 'gtf' (grabbed from the AnnotationHub
    object):

   addMetadataFromAnnotationHub <- function(x, ah)
   {
     stopifnot(length(ah) == 1L)
     metadata0 <- list(
         `Data source`=ah$sourceurl,
         `Provider`=ah$dataprovider,
         `Organism`=ah$species,
         `Taxonomy ID`=ah$taxonomyid
     )
     metadata(x) <- c(metadata0, metadata(x))
     x
   }

   gtf <- addMetadataFromAnnotationHub(gtf, z)

   metadata(gtf)
   # $`Data source`
   # [1] 
"ftp://ftp.ensembl.org/pub/release-80/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.80.gtf.gz"
   #
   # $Provider
   # [1] "Ensembl"
   #
   # $Organism
   # [1] "Caenorhabditis elegans"
   #
   # $`Taxonomy ID`
   # [1] 6239
   #
   # $AnnotationHubName
   # [1] "AH47045"

2) Pass the metadata to makeTxDbFromGRanges():

   txdb <- makeTxDbFromGRanges(gtf,
               metadata=data.frame(
                   name=names(metadata(gtf)),
                   value=as.character(metadata(gtf))))
   txdb
   # TxDb object:
   # Db type: TxDb
   # Supporting package: GenomicFeatures
   # Data source: 
ftp://ftp.ensembl.org/pub/release-80/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.80.gtf.gz
   # Provider: Ensembl
   # Organism: Caenorhabditis elegans
   # Taxonomy ID: 6239
   # AnnotationHubName: AH47045
   # Genome: WBcel235
   # transcript_nrow: 57834
   # exon_nrow: 173506
   # cds_nrow: 131562
   # Db created by: GenomicFeatures package from Bioconductor
   # Creation time: 2015-06-17 12:24:51 -0700 (Wed, 17 Jun 2015)
   # GenomicFeatures version at creation time: 1.21.13
   # RSQLite version at creation time: 1.0.0
   # DBSCHEMAVERSION: 1.1

   organism(txdb)
   # [1] "Caenorhabditis elegans"
   taxonomyId(txdb)
   # [1] 6239

Step 2) should be made easier because the metadata is already in 'gtf'
so there is no reason why the user would need to pass it again thru the
'metadata' argument. I'll made that change to makeTxDbFromGRanges().

H.

  
    
#
Hi Michael, Herve,
On 6/17/2015 9:43 PM, Herv? Pag?s wrote:
The GRanges from AnnotationHub now include more information in their 
metadata()
slot. This is added in devel (2.1.27).

library(AnnotationHub)
ah = AnnotationHub()
packageVersion('AnnotationHub')
gtf <- query(ah, c('gtf', 'homo sapiens', '77', 'ensembl'))
gr <- gtf[[1]]
gr
metadata(gr)

 > metadata(gr)
$AnnotationHubName
[1] "AH28812"

$`File Name`
[1] "Homo_sapiens.GRCh38.77.gtf.gz"

$`Data Source`
[1] 
"ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz"

$Provider
[1] "Ensembl"

$Organism
[1] "Homo sapiens"

$`Taxonomy ID`
[1] 9606


Only some of the fields have been added here, but note that all others 
can be accessed with -

mcols(ah["AH47045"])

or

ah[metadata(gr)$AnnotationHubName]


Thanks,
Sonali.

  
    
#
that's great Herv? and Sonali!

thanks for the quick response.

best, m
On Thu, Jun 18, 2015 at 10:07 AM, Sonali Arora <sarora at fredhutch.org> wrote: