Skip to content
Prev 10092 / 21312 Next

[Bioc-devel] makeTxDbFromGFF drops genes which have multiple chromosome locations. (with iGenome GTF)

No, I did not get that warning. But only:
Further, I noticed that:

1. Those genes are not actually "dropped", they can be shown with
`transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`.

2. The behavior is not associated with multiple chromosome locations,
though they have a large intersection.

Hence, I am not clear what is the actual factor causing this somehow
weired behavior.

As I am not using the development version of bioconductor, if anyone?
can help to reproduce it? I have uploaded a small dummy gtf file at?
https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb .


This is what I get:

$ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf
$ R
# Import genomic features from the file as a GRanges object ... OK
# Prepare the 'metadata' data frame ... OK
# Make the TxDb object ... OK
# GRanges object with 1 range and 1 metadata column:
#?????????seqnames???????????????ranges strand |?????gene_id
#????????????<Rle>????????????<IRanges>??<Rle> | <character>
#???TTTY5?????chrY [24442945, 24445023]??????- |???????TTTY5
#???-------
#???seqinfo: 2 sequences from an unspecified genome; no seqlengths
# GRangesList object of length 3:
# $TTTY17A?
# GRanges object with 3 ranges and 2 metadata columns:
#???????seqnames???????????????ranges strand |?????tx_id?????tx_name
#??????????<Rle>????????????<IRanges>??<Rle> | <integer> <character>
#???[1]?????chrY [24997731, 24998862]??????+ |?????????3 NR_001526_1
#???[2]?????chrY [26631479, 26632610]??????+ |?????????4 NR_001526_2
#???[3]?????chrY [27329790, 27330920]??????- |?????????6???NR_001526
#?
# $TTTY5?
# GRanges object with 1 range and 2 metadata columns:
#???????seqnames???????????????ranges strand | tx_id???tx_name
#???[1]?????chrY [24442945, 24445023]??????- |?????5 NR_001541
#?
# $XGY2?
# GRanges object with 2 ranges and 2 metadata columns:
#???????seqnames?????????????ranges strand | tx_id?????tx_name
#???[1]?????chrX [2670337, 2693037]??????+ |?????1???NR_003254
#???[2]?????chrY [2620337, 2643037]??????+ |?????2 NR_003254_1
#?
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths

As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
# R version 3.3.2 (2016-10-31)
# Platform: i686-pc-linux-gnu (32-bit)
# Running under: Ubuntu 16.04.1 LTS
#
# ......
#?
# attached base packages:
# [1] stats4????parallel??stats?????graphics??grDevices utils?????datasets?
# [8] methods???base?????
#?
# other attached packages:
# [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4???Biobase_2.32.0????????
# [4] GenomicRanges_1.24.3???GenomeInfoDb_1.8.3?????IRanges_2.6.1?????????
# [7] S4Vectors_0.10.3???????BiocGenerics_0.18.0???
On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote: