[Bioc-devel] makeTxDbFromGFF drops genes which have multiple chromosome locations. (with iGenome GTF)
No, I did not get that warning. But only:
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK
Further, I noticed that: 1. Those genes are not actually "dropped", they can be shown with `transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`. 2. The behavior is not associated with multiple chromosome locations, though they have a large intersection. Hence, I am not clear what is the actual factor causing this somehow weired behavior. As I am not using the development version of bioconductor, if anyone? can help to reproduce it? I have uploaded a small dummy gtf file at? https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb . This is what I get: $ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf $ R
library(GenomicFeatures)
txdb = makeTxDbFromGFF('dummy.gtf')
# Import genomic features from the file as a GRanges object ... OK # Prepare the 'metadata' data frame ... OK # Make the TxDb object ... OK
genes(txdb)
# GRanges object with 1 range and 1 metadata column: #?????????seqnames???????????????ranges strand |?????gene_id #????????????<Rle>????????????<IRanges>??<Rle> | <character> #???TTTY5?????chrY [24442945, 24445023]??????- |???????TTTY5 #???------- #???seqinfo: 2 sequences from an unspecified genome; no seqlengths
transcriptsBy(txdb, 'gene')
# GRangesList object of length 3: # $TTTY17A? # GRanges object with 3 ranges and 2 metadata columns: #???????seqnames???????????????ranges strand |?????tx_id?????tx_name #??????????<Rle>????????????<IRanges>??<Rle> | <integer> <character> #???[1]?????chrY [24997731, 24998862]??????+ |?????????3 NR_001526_1 #???[2]?????chrY [26631479, 26632610]??????+ |?????????4 NR_001526_2 #???[3]?????chrY [27329790, 27330920]??????- |?????????6???NR_001526 #? # $TTTY5? # GRanges object with 1 range and 2 metadata columns: #???????seqnames???????????????ranges strand | tx_id???tx_name #???[1]?????chrY [24442945, 24445023]??????- |?????5 NR_001541 #? # $XGY2? # GRanges object with 2 ranges and 2 metadata columns: #???????seqnames?????????????ranges strand | tx_id?????tx_name #???[1]?????chrX [2670337, 2693037]??????+ |?????1???NR_003254 #???[2]?????chrY [2620337, 2643037]??????+ |?????2 NR_003254_1 #? # ------- # seqinfo: 2 sequences from an unspecified genome; no seqlengths As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
sessionInfo()
# R version 3.3.2 (2016-10-31) # Platform: i686-pc-linux-gnu (32-bit) # Running under: Ubuntu 16.04.1 LTS # # ...... #? # attached base packages: # [1] stats4????parallel??stats?????graphics??grDevices utils?????datasets? # [8] methods???base????? #? # other attached packages: # [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4???Biobase_2.32.0???????? # [4] GenomicRanges_1.24.3???GenomeInfoDb_1.8.3?????IRanges_2.6.1????????? # [7] S4Vectors_0.10.3???????BiocGenerics_0.18.0???
On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote:
Did you not see a message like: Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In makeTxDbFromGRanges(gr, metadata = metadata) : ? The following transcripts were dropped because their exon ranks could ? not be inferred (either because the exons are not on the same ? chromosome/strand or because they are not separated by introns): ? NR_003254 On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:
Dear all, When trying to import the the GTF file downloaded from iGenome using makeTxDbFromGFF, I figured out that many genes are dropped silently, probably because those genes happens to have multiple chromosome locations in that GTF file.? I have posted it at?https://support.bioconductor.org/p/89401?with no reply yet. As I find it somehow a problem-causing bug, I decide to send the information here. Is there any suggested way to deal with the case? Best regards, Marlin
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel