[Bioc-devel] systemPipeR error - Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
Hi Herv? and Sonali, Thanks for looking into this option. Your suggestion makes a lot of sense. In general, I find it very useful to know what types of transcripts/genes are included (or missing) in GRanges/GRangesList instances obtained from txdb objects. For this, the tx_type classification is extremely useful. Thomas
On Mon, Oct 26, 2015 at 10:38 AM Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi Thomas, On 10/25/2015 01:06 PM, Thomas Girke wrote:
I fixed this in systemPipeR versions 1.4.3/1.5.3. The reason for this
error
was that the tx_type column contains only NA values when a txdb is
generated with
makeTxDbFromUCSC(). Returning here something more meaningful may be
useful,
such as the transcript type information available when a txdb is
generated
from a GFF.
We've considered this and might do it at some point. The difficulty though is that UCSC does not provide this information as part of the track itself so we'll have to go grab it from some other table in their huge db through many joins. In the mean time, I'll try to clarify this in the documentation. H.
Thanks, Thomas On Fri, Oct 23, 2015 at 12:49:09AM +0000, Thomas Girke wrote:
Thanks. Good to know. I have never tried this with an txdb instance from makeTxDbFromUCSC(). Will fix this over the weekend. Thomas On Thu, Oct 22, 2015 at 5:39 PM Arora, Sonali <sarora at fredhutch.org>
wrote:
Hi Thomas, I get the following error when I try to obtain the feature types using the function genFeatures()
library(systemPipeR) library(GenomicFeatures)
Loading required package: AnnotationDbi
txdb <- makeTxDbFromUCSC(genome = "hg19", tablename = "refGene")
Download the refGene table ... OK Download the refLink table ... OK Extract the 'transcripts' data frame ... OK Extract the 'splicings' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : UCSC data anomaly in 359 transcript(s): the cds cumulative length is not a multiple of 3 for transcripts 'NM_001037501' 'NM_001277444' 'NM_001037675' 'NM_001271872' 'NM_001170637' 'NM_001300952' 'NM_015326' 'NM_017940' 'NM_001271870' 'NM_001143962' 'NM_001305275' 'NM_001146344' 'NM_001300891' 'NM_001010890' 'NM_001300891' 'NM_001289974' 'NM_001291281' 'NM_001301371' 'NM_016178' 'NM_001134939' 'NM_001080427' 'NM_001145710' 'NM_001291328' 'NM_001271466' 'NM_001017915' 'NM_005541' 'NM_000348' 'NM_001145051' 'NM_001135649' 'NM_001128929' 'NM_001080423' 'NM_001144382' 'NM_001291661' 'NM_002958' 'NM_001005861' 'NM_004636' 'NM_001005914' 'NM_001290060' 'NM_001290061' 'NM_001289930' 'NM_003715' 'NM_001290049' 'NM_001286054' 'NM_001286053' 'NM_001286052' 'NM_182524' 'NM_001075' 'NM_00 [... truncated]
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE,
upstream=1000, + downstream=0, verbose=TRUE) Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : subscript contains NAs probably because - Browse[2]> tx GRanges object with 54439 ranges and 3 metadata columns: seqnames ranges strand | tx_name <Rle> <IRanges> <Rle> | <character> [1] chr1 [11874, 14409] + | NR_046018 [2] chr1 [30366, 30503] + | NR_036051 [3] chr1 [30366, 30503] + | NR_036266 [4] chr1 [30366, 30503] + | NR_036267 [5] chr1 [30366, 30503] + | NR_036268 ... ... ... ... ... ... [54435] chrUn_gl000228 [112605, 114676] + | NM_001306068 [54436] chrUn_gl000228 [ 29339, 32226] - | NM_001005217 [54437] chrUn_gl000228 [ 29339, 32226] - | NM_001286820 [54438] chrUn_gl000241 [ 14739, 36767] - | NR_132315 [54439] chrUn_gl000241 [ 16025, 36957] - | NR_132320 gene_id tx_type <CharacterList> <character> [1] 100287102 <NA> [2] 100302278 <NA> [3] 100422831 <NA> [4] 100422834 <NA> [5] 100422919 <NA> ... ... ... [54435] 100288687 <NA> [54436] 448831 <NA> [54437] 448831 <NA> [54438] 100289097 <NA> [54439] 102723780 <NA> ------- seqinfo: 93 sequences (1 circular) from hg19 genome Browse[2]> unique(mcols(tx)$tx_type) [1] NA debug: tmp <- tx[mcols(tx)$tx_type == tx_type[i]] Browse[2]> Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : subscript contains NAs Here is my sessionInfo
sessionInfo()
R Under development (unstable) (2015-10-15 r69519) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.2 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicFeatures_1.23.3 AnnotationDbi_1.33.0 [3] systemPipeR_1.5.1 RSQLite_1.0.0 [5] DBI_0.3.1 ShortRead_1.25.10 [7] GenomicAlignments_1.7.1 SummarizedExperiment_1.1.0 [9] Biobase_2.31.0 BiocParallel_1.5.0 [11] Rsamtools_1.23.0 Biostrings_2.39.0 [13] XVector_0.11.0 GenomicRanges_1.21.32 [15] GenomeInfoDb_1.7.1 IRanges_2.5.3 [17] S4Vectors_0.9.5 BiocGenerics_0.17.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.1 lattice_0.20-33 GO.db_3.2.2 [4] digest_0.6.8 plyr_1.8.3 futile.options_1.0.0 [7] BatchJobs_1.6 ggplot2_1.0.1 zlibbioc_1.17.0 [10] annotate_1.49.0 Matrix_1.2-2 checkmate_1.6.2 [13] proto_0.3-10 GOstats_2.37.0 splines_3.3.0 [16] stringr_1.0.0 pheatmap_1.0.7 RCurl_1.95-4.7 [19] biomaRt_2.27.0 munsell_0.4.2 sendmailR_1.2-1 [22] rtracklayer_1.31.1 base64enc_0.1-3 BBmisc_1.9 [25] fail_1.3 edgeR_3.13.0 XML_3.98-1.3 [28] AnnotationForge_1.13.0 MASS_7.3-44 bitops_1.0-6 [31] grid_3.3.0 RBGL_1.47.0 xtable_1.7-4 [34] GSEABase_1.33.0 gtable_0.1.2 magrittr_1.5 [37] scales_0.3.0 graph_1.49.1 stringi_1.0-1 [40] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.53.0 [43] limma_3.27.0 latticeExtra_0.6-26 futile.logger_1.4.1 [46] brew_1.0-6 rjson_0.2.15 lambda.r_1.1.7 [49] RColorBrewer_1.1-2 tools_3.3.0 Category_2.37.0 [52] survival_2.38-3 colorspace_1.2-6 -- Thanks and Regards, Sonali
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319