Skip to content

[Bioc-devel] makeTxDbFromGFF drops genes which have multiple chromosome locations. (with iGenome GTF)

4 messages · Vincent Carey, Jialin Ma, Martin Morgan

#
Dear all,


When trying to import the the GTF file downloaded from iGenome using
makeTxDbFromGFF, I figured out that many genes are dropped silently,
probably because those genes happens to have multiple chromosome
locations in that GTF file.?

I have posted it at?https://support.bioconductor.org/p/89401?with no
reply yet. As I find it somehow a problem-causing bug, I decide to send
the information here.

Is there any suggested way to deal with the case?


Best regards,
Marlin
#
*(forgot to carbon list)*


*Did you not see a message like:*


*Import genomic features from the file as a GRanges object ... OK*

*Prepare the 'metadata' data frame ... OK*

*Make the TxDb object ... OK*

*Warning message:*

*In makeTxDbFromGRanges(gr, metadata = metadata) :*

*  The following transcripts were dropped because their exon ranks could*

*  not be inferred (either because the exons are not on the same*

*  chromosome/strand or because they are not separated by introns):*

*  NR_003254*
On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:

            

  
  
#
No, I did not get that warning. But only:
Further, I noticed that:

1. Those genes are not actually "dropped", they can be shown with
`transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`.

2. The behavior is not associated with multiple chromosome locations,
though they have a large intersection.

Hence, I am not clear what is the actual factor causing this somehow
weired behavior.

As I am not using the development version of bioconductor, if anyone?
can help to reproduce it? I have uploaded a small dummy gtf file at?
https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb .


This is what I get:

$ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf
$ R
# Import genomic features from the file as a GRanges object ... OK
# Prepare the 'metadata' data frame ... OK
# Make the TxDb object ... OK
# GRanges object with 1 range and 1 metadata column:
#?????????seqnames???????????????ranges strand |?????gene_id
#????????????<Rle>????????????<IRanges>??<Rle> | <character>
#???TTTY5?????chrY [24442945, 24445023]??????- |???????TTTY5
#???-------
#???seqinfo: 2 sequences from an unspecified genome; no seqlengths
# GRangesList object of length 3:
# $TTTY17A?
# GRanges object with 3 ranges and 2 metadata columns:
#???????seqnames???????????????ranges strand |?????tx_id?????tx_name
#??????????<Rle>????????????<IRanges>??<Rle> | <integer> <character>
#???[1]?????chrY [24997731, 24998862]??????+ |?????????3 NR_001526_1
#???[2]?????chrY [26631479, 26632610]??????+ |?????????4 NR_001526_2
#???[3]?????chrY [27329790, 27330920]??????- |?????????6???NR_001526
#?
# $TTTY5?
# GRanges object with 1 range and 2 metadata columns:
#???????seqnames???????????????ranges strand | tx_id???tx_name
#???[1]?????chrY [24442945, 24445023]??????- |?????5 NR_001541
#?
# $XGY2?
# GRanges object with 2 ranges and 2 metadata columns:
#???????seqnames?????????????ranges strand | tx_id?????tx_name
#???[1]?????chrX [2670337, 2693037]??????+ |?????1???NR_003254
#???[2]?????chrY [2620337, 2643037]??????+ |?????2 NR_003254_1
#?
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths

As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
# R version 3.3.2 (2016-10-31)
# Platform: i686-pc-linux-gnu (32-bit)
# Running under: Ubuntu 16.04.1 LTS
#
# ......
#?
# attached base packages:
# [1] stats4????parallel??stats?????graphics??grDevices utils?????datasets?
# [8] methods???base?????
#?
# other attached packages:
# [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4???Biobase_2.32.0????????
# [4] GenomicRanges_1.24.3???GenomeInfoDb_1.8.3?????IRanges_2.6.1?????????
# [7] S4Vectors_0.10.3???????BiocGenerics_0.18.0???
On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote:
#
On 11/13/2016 01:07 PM, Marlin JL.M wrote:
In the current release version

 > packageVersion("GenomicFeatures")
[1] '1.26.0'

look up the help for ?genes and use the argument 
single.strand.genes.only=FALSE

Martin
This email message may contain legally privileged and/or...{{dropped:2}}