Skip to content

[Bioc-devel] seqnames missing in headerTabix()

4 messages · Valerie Obenchain, Anita Lerch, Martin Morgan

#
Hi,

I tried to stream a 'gtf' file from the ensemble with the Tabix methods.
The creation of the index files seems to work, but when I checked it
with headerTabix(tbx)$seqnames and got character(0).
Of course the scanTabix() didn't worked then too.
I do not have this problem with the example file in the Rsamtools
package.
Does anybody has an explanation for this?

Thanks in advance,
Anita
[1] "Drosophila_melanogaster.BDGP5.25.62.gtf.gz.tbi"
$seqnames
character(0)

$indexColumns
  seq start   end 
    1     4     5 

$skip
[1] 0

$comment
[1] "#"

$header
character(0)
character(0)
Error: scanTabix: '3L' not present in tabix index
  path: /home_fmi/01/lerchani/workspace/Drosophila_melanogaster.BDGP5.25.62.gtf.gz
R Under development (unstable) (2011-08-23 r56776)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsamtools_1.5.51     Biostrings_2.21.9    GenomicRanges_1.5.28 IRanges_1.11.24     

loaded via a namespace (and not attached):
[1] BSgenome_1.21.3     RCurl_1.6-9         rtracklayer_1.13.11 tools_2.14.0        XML_3.4-2           zlibbioc_0.1.7
#
Hi Anita,

It looks like the download may not have worked. Check your gtfFn file to 
see if the data are really there,

     less Drosophila_melanogaster.BDGP5.25.62.gtf.gz

Once you are sure of the download you may want to check the file for the 
usual things -
(1) no comments lines starting with #
(2) the file is tab separated, not space separated

Coming from ensembl these should not be a problem.

Valerie
On 08/23/2011 07:02 AM, Anita Lerch wrote:
#
Hi,
thanks for the answer. The download worked fine (sorry I deleted the
output off the download).

In between time I figured out the problem. The file was not bgzip.

$ ./tabix -p gff ../Drosophila_melanogaster.BDGP5.25.62.gtf.gz
[tabix] was bgzip used to compress this
file? ../Drosophila_melanogaster.BDGP5.25.62.gtf.gz

I fixed it with 

$ (grep ^"#" ../Drosophila_melanogaster.BDGP5.25.62.gtf; grep -v
^"#" ../Drosophila_melanogaster.BDGP5.25.62.gtf | sort -k1,1 -k4,4n)
| ./bgzip > ../Drosophila_melanogaster.BDGP5.25.62.gtf.bgz

$ ./tabix ../Drosophila_melanogaster.BDGP5.25.62.gtf.bgz -p gff

There is nothing written about bgzip in the ?TabixFile manual page and
the file extension of example.gtf.gz is misleading.
I never had a look at the ?indexTabix manual page, where it is clearly
written that it has to be bgzip.

Is it possible to forward the error message in future somehow?

I have to mention, that Rsamtools is just great. Thanks a lot for it.
Greetings
Anita
On Tue, 2011-08-23 at 09:14 -0700, Valerie Obenchain wrote:

  
    
#
On 08/24/2011 01:26 AM, Anita Lerch wrote:
Hi Anita -- thanks, yes, the error is now (version 1.5.54) reported when 
indexTabix is applied to a non-bgzip'd file. Martin