Dear all, When trying to import the the GTF file downloaded from iGenome using makeTxDbFromGFF, I figured out that many genes are dropped silently, probably because those genes happens to have multiple chromosome locations in that GTF file.? I have posted it at?https://support.bioconductor.org/p/89401?with no reply yet. As I find it somehow a problem-causing bug, I decide to send the information here. Is there any suggested way to deal with the case? Best regards, Marlin
[Bioc-devel] makeTxDbFromGFF drops genes which have multiple chromosome locations. (with iGenome GTF)
4 messages · Vincent Carey, Jialin Ma, Martin Morgan
*(forgot to carbon list)* *Did you not see a message like:* *Import genomic features from the file as a GRanges object ... OK* *Prepare the 'metadata' data frame ... OK* *Make the TxDb object ... OK* *Warning message:* *In makeTxDbFromGRanges(gr, metadata = metadata) :* * The following transcripts were dropped because their exon ranks could* * not be inferred (either because the exons are not on the same* * chromosome/strand or because they are not separated by introns):* * NR_003254*
On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:
Dear all, When trying to import the the GTF file downloaded from iGenome using makeTxDbFromGFF, I figured out that many genes are dropped silently, probably because those genes happens to have multiple chromosome locations in that GTF file. I have posted it at https://support.bioconductor.org/p/89401 with no reply yet. As I find it somehow a problem-causing bug, I decide to send the information here. Is there any suggested way to deal with the case? Best regards, Marlin
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
No, I did not get that warning. But only:
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK
Further, I noticed that: 1. Those genes are not actually "dropped", they can be shown with `transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`. 2. The behavior is not associated with multiple chromosome locations, though they have a large intersection. Hence, I am not clear what is the actual factor causing this somehow weired behavior. As I am not using the development version of bioconductor, if anyone? can help to reproduce it? I have uploaded a small dummy gtf file at? https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb . This is what I get: $ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf $ R
library(GenomicFeatures)
txdb = makeTxDbFromGFF('dummy.gtf')
# Import genomic features from the file as a GRanges object ... OK # Prepare the 'metadata' data frame ... OK # Make the TxDb object ... OK
genes(txdb)
# GRanges object with 1 range and 1 metadata column: #?????????seqnames???????????????ranges strand |?????gene_id #????????????<Rle>????????????<IRanges>??<Rle> | <character> #???TTTY5?????chrY [24442945, 24445023]??????- |???????TTTY5 #???------- #???seqinfo: 2 sequences from an unspecified genome; no seqlengths
transcriptsBy(txdb, 'gene')
# GRangesList object of length 3: # $TTTY17A? # GRanges object with 3 ranges and 2 metadata columns: #???????seqnames???????????????ranges strand |?????tx_id?????tx_name #??????????<Rle>????????????<IRanges>??<Rle> | <integer> <character> #???[1]?????chrY [24997731, 24998862]??????+ |?????????3 NR_001526_1 #???[2]?????chrY [26631479, 26632610]??????+ |?????????4 NR_001526_2 #???[3]?????chrY [27329790, 27330920]??????- |?????????6???NR_001526 #? # $TTTY5? # GRanges object with 1 range and 2 metadata columns: #???????seqnames???????????????ranges strand | tx_id???tx_name #???[1]?????chrY [24442945, 24445023]??????- |?????5 NR_001541 #? # $XGY2? # GRanges object with 2 ranges and 2 metadata columns: #???????seqnames?????????????ranges strand | tx_id?????tx_name #???[1]?????chrX [2670337, 2693037]??????+ |?????1???NR_003254 #???[2]?????chrY [2620337, 2643037]??????+ |?????2 NR_003254_1 #? # ------- # seqinfo: 2 sequences from an unspecified genome; no seqlengths As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
sessionInfo()
# R version 3.3.2 (2016-10-31) # Platform: i686-pc-linux-gnu (32-bit) # Running under: Ubuntu 16.04.1 LTS # # ...... #? # attached base packages: # [1] stats4????parallel??stats?????graphics??grDevices utils?????datasets? # [8] methods???base????? #? # other attached packages: # [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4???Biobase_2.32.0???????? # [4] GenomicRanges_1.24.3???GenomeInfoDb_1.8.3?????IRanges_2.6.1????????? # [7] S4Vectors_0.10.3???????BiocGenerics_0.18.0???
On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote:
Did you not see a message like: Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In makeTxDbFromGRanges(gr, metadata = metadata) : ? The following transcripts were dropped because their exon ranks could ? not be inferred (either because the exons are not on the same ? chromosome/strand or because they are not separated by introns): ? NR_003254 On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:
Dear all, When trying to import the the GTF file downloaded from iGenome using makeTxDbFromGFF, I figured out that many genes are dropped silently, probably because those genes happens to have multiple chromosome locations in that GTF file.? I have posted it at?https://support.bioconductor.org/p/89401?with no reply yet. As I find it somehow a problem-causing bug, I decide to send the information here. Is there any suggested way to deal with the case? Best regards, Marlin
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 11/13/2016 01:07 PM, Marlin JL.M wrote:
No, I did not get that warning. But only:
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK
Further, I noticed that: 1. Those genes are not actually "dropped", they can be shown with `transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`. 2. The behavior is not associated with multiple chromosome locations, though they have a large intersection. Hence, I am not clear what is the actual factor causing this somehow weired behavior. As I am not using the development version of bioconductor, if anyone can help to reproduce it? I have uploaded a small dummy gtf file at https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb .
In the current release version
> packageVersion("GenomicFeatures")
[1] '1.26.0'
look up the help for ?genes and use the argument
single.strand.genes.only=FALSE
Martin
This is what I get: $ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf $ R
library(GenomicFeatures)
txdb = makeTxDbFromGFF('dummy.gtf')
# Import genomic features from the file as a GRanges object ... OK # Prepare the 'metadata' data frame ... OK # Make the TxDb object ... OK
genes(txdb)
# GRanges object with 1 range and 1 metadata column: # seqnames ranges strand | gene_id # <Rle> <IRanges> <Rle> | <character> # TTTY5 chrY [24442945, 24445023] - | TTTY5 # ------- # seqinfo: 2 sequences from an unspecified genome; no seqlengths
transcriptsBy(txdb, 'gene')
# GRangesList object of length 3: # $TTTY17A # GRanges object with 3 ranges and 2 metadata columns: # seqnames ranges strand | tx_id tx_name # <Rle> <IRanges> <Rle> | <integer> <character> # [1] chrY [24997731, 24998862] + | 3 NR_001526_1 # [2] chrY [26631479, 26632610] + | 4 NR_001526_2 # [3] chrY [27329790, 27330920] - | 6 NR_001526 # # $TTTY5 # GRanges object with 1 range and 2 metadata columns: # seqnames ranges strand | tx_id tx_name # [1] chrY [24442945, 24445023] - | 5 NR_001541 # # $XGY2 # GRanges object with 2 ranges and 2 metadata columns: # seqnames ranges strand | tx_id tx_name # [1] chrX [2670337, 2693037] + | 1 NR_003254 # [2] chrY [2620337, 2643037] + | 2 NR_003254_1 # # ------- # seqinfo: 2 sequences from an unspecified genome; no seqlengths As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
sessionInfo()
# R version 3.3.2 (2016-10-31) # Platform: i686-pc-linux-gnu (32-bit) # Running under: Ubuntu 16.04.1 LTS # # ...... # # attached base packages: # [1] stats4 parallel stats graphics grDevices utils datasets # [8] methods base # # other attached packages: # [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4 Biobase_2.32.0 # [4] GenomicRanges_1.24.3 GenomeInfoDb_1.8.3 IRanges_2.6.1 # [7] S4Vectors_0.10.3 BiocGenerics_0.18.0 On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote:
Did you not see a message like: Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In makeTxDbFromGRanges(gr, metadata = metadata) : The following transcripts were dropped because their exon ranks could not be inferred (either because the exons are not on the same chromosome/strand or because they are not separated by introns): NR_003254 On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:
Dear all, When trying to import the the GTF file downloaded from iGenome using makeTxDbFromGFF, I figured out that many genes are dropped silently, probably because those genes happens to have multiple chromosome locations in that GTF file. I have posted it at https://support.bioconductor.org/p/89401 with no reply yet. As I find it somehow a problem-causing bug, I decide to send the information here. Is there any suggested way to deal with the case? Best regards, Marlin
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}