[Bioc-devel] Problem in asBam from Rsamtools
Dear Raf, Thanks for letting me know. I saw the same problem happened here before as well. Best wishes, Wei
On Jun 4, 2013, at 5:49 PM, rcaloger wrote:
Dear Wei, I think the problem was due to a file system issue. I check the raid and one of the disk failed, it could be that this created some problem to the I/O also because during the run of Rsubread there were many other I/O processes running. I run again Rsubread and I did not got any error Thanks for the help Raf On 6/2/13 1:22 AM, Wei Shi wrote:
Dear Raf, As Martin pointed, that line seems to be the concatenation of two records. But the second record is incomplete (it doesn't have the read identifier). It seems more likely to be a file system problem rather than Rsubread problem. Could you please also provide the line before the problematic line? You may also rerun the alignment on a different disk to see if you will see this problem again. Hope this helps. Best wishes, Wei On Jun 2, 2013, at 2:35 AM, Martin Morgan wrote:
On 06/01/2013 08:04 AM, rcaloger wrote:
Hi, I am using the devel version of Bioconductor as part of the development of my package chimera. Testing a new function in chimera, that uses Rsubread package, I encountered a problem in converting a sam file generated by Rsubread in a bam file. I used the function asBam from Rsamtools and I got the following error: In doTryCatch(return(expr), name, parentenv, handler) : Parse error at line 14667325: sequence and quality are inconsistent I managed to run asBam if I use only the sam file till line 14667324 Instead I get the above error if I use a sam file finishing at line 14667325 The line that create the problem is the following: HWI-ST169:273:D0YW6ACXX:2:1201:4070:162856 141 * 0 0 * * 0 0 AAAAAAGGGTTGAATTATTTTCACTTGCCCACGTAGTTTATGAATGTGGGAAATAGCTTCAAAGACAGATTAAATGATTTGCCCAAGGCCACAGAAAAGAG @@@FFFFFHABHHJGGBFIGIFHGIJHGJGJIFBGHDBG9BDAFIIDHIIGCHCHI<GACC at ADHHHE;7?@DEFED>@;ACCC>ABB;AAD<BC> 77 * 0 0 * * 0 0 CATGGATGAGGAGAATGAGGATTTTGCGCCGGCTGCTCAGAAGATACCGTGAATCTAAGAAGATCGATCGCCACATGTATCACAGCCTGTACCTGAAGGGG @@@DD?BADHF<D<ACG>FFE;BBF at B?@C at F:(?1.=)))883)8=7@(65??EEBDEC37;;>???=BB@<BBCCACBDDCC:?BCBC:@#########
This looks like two separate records have been concatenated; it's really hard to know whether this is Rsubread or some aspect of the file system or the way the file has been handled after creation by Rsubread. Picard is one commonly used tool for validation. Martin
Does anybody has an idea of what is wrong in this line? There is any way to validate the sam file before running asBam to detect and filtered out lines that might create problems in the conversion into Bam? Cheers Raf ######## sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.13.16 Biostrings_2.29.3 GenomicRanges_1.13.15 [4] XVector_0.1.0 IRanges_1.19.8 BiocGenerics_0.7.2 loaded via a namespace (and not attached): [1] bitops_1.0-5 stats4_3.0.0 zlibbioc_1.7.0
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________
--
----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
tel. ++39 0116706457
Fax ++39 0112366457
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
raffaele[dot]calogero[at]gmail[dot]com
www: http://www.bioinformatica.unito.it
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}