Skip to content

[Bioc-devel] [GenomicFeatures] no pkgName found. makeTxDbPackage() called after txdb created from GFF3 file

8 messages · Tengfei Yin, Marc Carlson, Cook, Malcolm

1 day later
#
Hi Tengfei,

Yes that looks like an oversight.  Thanks for reporting that!  I will 
extend makeTxDbPackage so that it's more accommodating of these newer 
transcriptDbs.  If you want to help me out, you could call saveDb() on 
your gmax189 object and send me the .sqlite file that you save it to.

Also, if you have any alternate options for importing your data (other 
than using GFF or GTF): I think you probably should consider it.  The 
file specifications for these filetypes are missing key details and so 
you can very easily get a "legal" GFF or GTF file that is actually 
missing important details from it's contents.  For example, they can 
commonly lack information about the order of the exons for a given 
transcript, which can render them difficult (or impossible) to use for 
transcript work.   But for these specifications, that information is 
"optional".


   Marc
On 02/06/2013 09:46 PM, Tengfei Yin wrote:
#
.Hi Tengfei,
 .
 .Yes that looks like an oversight.  Thanks for reporting that!  I will
 .extend makeTxDbPackage so that it's more accommodating of these newer
 .transcriptDbs.  If you want to help me out, you could call saveDb() on
 .your gmax189 object and send me the .sqlite file that you save it to.
 .
 .Also, if you have any alternate options for importing your data (other
 .than using GFF or GTF): I think you probably should consider it.  The
 .file specifications for these filetypes are missing key details and so
 .you can very easily get a "legal" GFF or GTF file that is actually
 .missing important details from it's contents.  For example, they can
 .commonly lack information about the order of the exons for a given
 .transcript, which can render them difficult (or impossible) to use for
 .transcript work.   But for these specifications, that information is
 ."optional".

Marco, do you have any comment on ensembl GTF (which has exon order) in this regard?

Thanks,

Malcolm

 .
 .
 .   Marc
 .
 .
 .
.On 02/06/2013 09:46 PM, Tengfei Yin wrote:
.> Dear all,
 .>
 .> I am trying to build a txdb object from gff3 for soybean data and try to
 .> make it a package. Code used like this
 .>
 .> gmax189<- makeTranscriptDbFromGFF("~/Gmax_189_gene_exons.gff3",
 .>                                     format = "gff3", species = "Glycine max",
 .>                                     dataSource = "http://www.phytozome.org/")
 .> makeTxDbPackage(txdb = gmax189,
 .>                  version = "0.9.1",
 .>                  maintainer = "Tengfei Yin",
 .>                  author = "Tengfei Yin",
 .>                  destDir=".",
 .>                  license="Artistic-2.0")
 .>
 .> Error message:
 .> Error in gsub("_", "", pkgName) :
 .>    error in evaluating the argument 'x' in selecting a method for function
 .> 'gsub': Error: object 'pkgName' not found
 .>
 .>
 .> Looks like my dataSource should be either BioMart or UCSC, otherwise no
 .> pkgname will be produced in function .makePackageName?
 .>
 .> Or should I build annotation package in some other ways?
 .>
 .> Thanks a lot
 .>
 .> Tengfei
 .>
 .> my sessionInfo
 .>
 .>> sessionInfo()
 .> R Under development (unstable) (2013-01-21 r61728)
 .> Platform: x86_64-unknown-linux-gnu (64-bit)
 .>
 .> locale:
 .>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 .>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 .>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 .>   [7] LC_PAPER=C                 LC_NAME=C
 .>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
 .> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 .>
 .> attached base packages:
 .> [1] parallel  stats     graphics  grDevices utils     datasets  methods
 .> [8] base
 .>
 .> other attached packages:
 .> [1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10  Biobase_2.19.2
 .> [4] GenomicRanges_1.11.28  IRanges_1.17.31        BiocGenerics_0.5.6
 .>
 .> loaded via a namespace (and not attached):
 .>   [1] biomaRt_2.15.0     Biostrings_2.27.10 bitops_1.0-5
 .> BSgenome_1.27.1
 .>   [5] DBI_0.2-5          RCurl_1.95-3       Rsamtools_1.11.15
 .>   RSQLite_0.11.2
 .>   [9] rtracklayer_1.19.9 stats4_3.0.0       tools_3.0.0        XML_3.95-0.1
 .>
 .> [13] zlibbioc_1.5.0
 .>
 .>
 .
 ._______________________________________________
 .Bioc-devel at r-project.org mailing list
 .https://stat.ethz.ch/mailman/listinfo/bioc-devel
#
Hi Malcolm,

In general I have found ensembl to be really great and I expect that 
their gtf files are probably fine.  Usually the exon rank is the 1st 
thing you will see left out when a gtf file is cutting corners, and you 
are correct that they seem to be including that. I ran the one for Homo 
sapiens though makeTranscriptDbFromGFF() and everything appears to be in 
working order.

I wanted to warn Tengfei about this because I worry that most people 
will be surprised to learn that the gtf file format comes with fewer 
guarantees about the data included than they might have expected.  I 
also mentioned it because I noticed that his function call to 
makeTranscriptDbFromGFF() did not specify an exonRankAttributeName, 
which strongly implies to me that maybe that his file might not have had 
that information present.  The assumption was that if he had that 
information, he would have supplied that argument so that he could make 
use of it.  But another possibility is that Tengfei just didn't need 
that information at all, in which case this will all just be another 
(possibly unwarranted) public service message.  If that is the case, I 
apologise for the noise.


   Marc
On 02/08/2013 06:19 PM, Cook, Malcolm wrote:
#
Thanks Marc for the confirmatory words.  I am having success with using ensembl gene build mouse GTF, changing chromosomes to comport with UCSC (mostly adding 'chr' prefix), and visualizing results at ucsc.  Best of both worlds.

Cheers,

Malcolm


 .-----Original Message-----
 .From: Marc Carlson [mailto:mcarlson at fhcrc.org]
 .Sent: Saturday, February 09, 2013 3:57 AM
 .To: Cook, Malcolm
 .Cc: 'bioc-devel at r-project.org'
 .Subject: Re: [Bioc-devel] [GenomicFeatures] no pkgName found. makeTxDbPackage() called after txdb created from GFF3 file
 .
 .Hi Malcolm,
 .
 .In general I have found ensembl to be really great and I expect that
 .their gtf files are probably fine.  Usually the exon rank is the 1st
 .thing you will see left out when a gtf file is cutting corners, and you
 .are correct that they seem to be including that. I ran the one for Homo
 .sapiens though makeTranscriptDbFromGFF() and everything appears to be in
 .working order.
 .
 .I wanted to warn Tengfei about this because I worry that most people
 .will be surprised to learn that the gtf file format comes with fewer
 .guarantees about the data included than they might have expected.  I
 .also mentioned it because I noticed that his function call to
 .makeTranscriptDbFromGFF() did not specify an exonRankAttributeName,
 .which strongly implies to me that maybe that his file might not have had
 .that information present.  The assumption was that if he had that
 .information, he would have supplied that argument so that he could make
 .use of it.  But another possibility is that Tengfei just didn't need
 .that information at all, in which case this will all just be another
 .(possibly unwarranted) public service message.  If that is the case, I
 .apologise for the noise.
 .
 .
 .   Marc
 .
 .
 .
.On 02/08/2013 06:19 PM, Cook, Malcolm wrote:
.> .Hi Tengfei,
 .>   .
 .>   .Yes that looks like an oversight.  Thanks for reporting that!  I will
 .>   .extend makeTxDbPackage so that it's more accommodating of these newer
 .>   .transcriptDbs.  If you want to help me out, you could call saveDb() on
 .>   .your gmax189 object and send me the .sqlite file that you save it to.
 .>   .
 .>   .Also, if you have any alternate options for importing your data (other
 .>   .than using GFF or GTF): I think you probably should consider it.  The
 .>   .file specifications for these filetypes are missing key details and so
 .>   .you can very easily get a "legal" GFF or GTF file that is actually
 .>   .missing important details from it's contents.  For example, they can
 .>   .commonly lack information about the order of the exons for a given
 .>   .transcript, which can render them difficult (or impossible) to use for
 .>   .transcript work.   But for these specifications, that information is
 .>   ."optional".
 .>
 .> Marco, do you have any comment on ensembl GTF (which has exon order) in this regard?
 .>
 .> Thanks,
 .>
 .> Malcolm
 .>
 .>   .
 .>   .
 .>   .   Marc
 .>   .
 .>   .
 .>   .
.> .On 02/06/2013 09:46 PM, Tengfei Yin wrote:
.>   .> Dear all,
 .>   .>
 .>   .> I am trying to build a txdb object from gff3 for soybean data and try to
 .>   .> make it a package. Code used like this
 .>   .>
 .>   .> gmax189<- makeTranscriptDbFromGFF("~/Gmax_189_gene_exons.gff3",
 .>   .>                                     format = "gff3", species = "Glycine max",
 .>   .>                                     dataSource = "http://www.phytozome.org/")
 .>   .> makeTxDbPackage(txdb = gmax189,
 .>   .>                  version = "0.9.1",
 .>   .>                  maintainer = "Tengfei Yin",
 .>   .>                  author = "Tengfei Yin",
 .>   .>                  destDir=".",
 .>   .>                  license="Artistic-2.0")
 .>   .>
 .>   .> Error message:
 .>   .> Error in gsub("_", "", pkgName) :
 .>   .>    error in evaluating the argument 'x' in selecting a method for function
 .>   .> 'gsub': Error: object 'pkgName' not found
 .>   .>
 .>   .>
 .>   .> Looks like my dataSource should be either BioMart or UCSC, otherwise no
 .>   .> pkgname will be produced in function .makePackageName?
 .>   .>
 .>   .> Or should I build annotation package in some other ways?
 .>   .>
 .>   .> Thanks a lot
 .>   .>
 .>   .> Tengfei
 .>   .>
 .>   .> my sessionInfo
 .>   .>
 .>   .>> sessionInfo()
 .>   .> R Under development (unstable) (2013-01-21 r61728)
 .>   .> Platform: x86_64-unknown-linux-gnu (64-bit)
 .>   .>
 .>   .> locale:
 .>   .>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 .>   .>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 .>   .>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 .>   .>   [7] LC_PAPER=C                 LC_NAME=C
 .>   .>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
 .>   .> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 .>   .>
 .>   .> attached base packages:
 .>   .> [1] parallel  stats     graphics  grDevices utils     datasets  methods
 .>   .> [8] base
 .>   .>
 .>   .> other attached packages:
 .>   .> [1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10  Biobase_2.19.2
 .>   .> [4] GenomicRanges_1.11.28  IRanges_1.17.31        BiocGenerics_0.5.6
 .>   .>
 .>   .> loaded via a namespace (and not attached):
 .>   .>   [1] biomaRt_2.15.0     Biostrings_2.27.10 bitops_1.0-5
 .>   .> BSgenome_1.27.1
 .>   .>   [5] DBI_0.2-5          RCurl_1.95-3       Rsamtools_1.11.15
 .>   .>   RSQLite_0.11.2
 .>   .>   [9] rtracklayer_1.19.9 stats4_3.0.0       tools_3.0.0        XML_3.95-0.1
 .>   .>
 .>   .> [13] zlibbioc_1.5.0
 .>   .>
 .>   .>
 .>   .
 .>   ._______________________________________________
 .>   .Bioc-devel at r-project.org mailing list
 .>   .https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later