[Bioc-devel] seqnames missing in headerTabix()
Hi, thanks for the answer. The download worked fine (sorry I deleted the output off the download). In between time I figured out the problem. The file was not bgzip. $ ./tabix -p gff ../Drosophila_melanogaster.BDGP5.25.62.gtf.gz [tabix] was bgzip used to compress this file? ../Drosophila_melanogaster.BDGP5.25.62.gtf.gz I fixed it with $ (grep ^"#" ../Drosophila_melanogaster.BDGP5.25.62.gtf; grep -v ^"#" ../Drosophila_melanogaster.BDGP5.25.62.gtf | sort -k1,1 -k4,4n) | ./bgzip > ../Drosophila_melanogaster.BDGP5.25.62.gtf.bgz $ ./tabix ../Drosophila_melanogaster.BDGP5.25.62.gtf.bgz -p gff There is nothing written about bgzip in the ?TabixFile manual page and the file extension of example.gtf.gz is misleading. I never had a look at the ?indexTabix manual page, where it is clearly written that it has to be bgzip. Is it possible to forward the error message in future somehow? I have to mention, that Rsamtools is just great. Thanks a lot for it. Greetings Anita
On Tue, 2011-08-23 at 09:14 -0700, Valerie Obenchain wrote:
Hi Anita,
It looks like the download may not have worked. Check your gtfFn file to
see if the data are really there,
less Drosophila_melanogaster.BDGP5.25.62.gtf.gz
Once you are sure of the download you may want to check the file for the
usual things -
(1) no comments lines starting with #
(2) the file is tab separated, not space separated
Coming from ensembl these should not be a problem.
Valerie
On 08/23/2011 07:02 AM, Anita Lerch wrote:
Hi, I tried to stream a 'gtf' file from the ensemble with the Tabix methods. The creation of the index files seems to work, but when I checked it with headerTabix(tbx)$seqnames and got character(0). Of course the scanTabix() didn't worked then too. I do not have this problem with the example file in the Rsamtools package. Does anybody has an explanation for this? Thanks in advance, Anita
library(Rsamtools) url<- "ftp://ftp.ensembl.org/pub/release-62/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.25.62.gtf.gz" gtfFn<- "Drosophila_melanogaster.BDGP5.25.62.gtf.gz" download.file(url, gtfFn, "wget") indexTabix(gtfFn, format="gff")
[1] "Drosophila_melanogaster.BDGP5.25.62.gtf.gz.tbi"
tbx<- open(TabixFile(gtfFn)) headerTabix(tbx)
$seqnames
character(0)
$indexColumns
seq start end
1 4 5
$skip
[1] 0
$comment
[1] "#"
$header
character(0)
seqnamesTabix(tbx)
character(0)
cat(yieldTabix(tbx, yieldSize=1L))
param<- GRanges(c("3L", "3R"), IRanges(c(1, 1), width=100000))
scanTabix(tbx, param=param)
Error: scanTabix: '3L' not present in tabix index path: /home_fmi/01/lerchani/workspace/Drosophila_melanogaster.BDGP5.25.62.gtf.gz
sessionInfo()
R Under development (unstable) (2011-08-23 r56776) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.5.51 Biostrings_2.21.9 GenomicRanges_1.5.28 IRanges_1.11.24 loaded via a namespace (and not attached): [1] BSgenome_1.21.3 RCurl_1.6-9 rtracklayer_1.13.11 tools_2.14.0 XML_3.4-2 zlibbioc_0.1.7
Anita Lerch Friedrich Miescher Institute Maulbeerstrasse 66 WRO-1066.P22 4058 Basel Phone: +41 (0)61 697 5172 Email: anita.lerch at fmi.ch