Hello, I have discovered a bug in the cdfDuplicates function in the les package. This function is used indirectly by the GSRI package, and I was attempting to use this package when I encountered an error. The error appears to occur because both rle and table are used to deduplicate a (sorted) vector, and these two functions apparently fail to use the same definition of equality for floating point values. This results in two different length vectors, which raises an error when they are passed to rep.int, which requires vectors of the same length. Replacing rle(pvalSort)$length with table(pvalSort) seems to solve the problem. I have compiled my test case into an RDS file that you can download and use to reproduce the bug: https://www.dropbox.com/s/k7k1m3s28aa4ajb/GSRI-les-cdfDuplicates-error-case.RDS This RDS file contains the full argument list that I pass to the "gsri" function to reproduce the error. Just download it, then execute the following R code: library(GSRI) do.call(gsri, readRDS("GSRI-les-cdfDuplicates-error-case.RDS")) After making the suggested change, this test case now works properly. The expression data is my own, and the gene set is MSigDB ID "AAAYRNCTG_UNKNOWN", with the gene IDs converted to my organism (cynomolgus monkey, whose genes are annotated with orthologous Ensembl Peptide IDs from human & rhesus). -Ryan
[Bioc-devel] Bug in les:::cdfDuplicates
3 messages · Julian Gehring, Ryan
Hi Ryan, Thank you for the detailed bug report and already providing a fix for this. I have added your patch to 'les_1.13.2' and pushed it to bioc-devel, the updated build should become available soon. I'll do some more tests within the next days, and then also update bioc-release. If you need a patched version of the package now, let me know. Best wishes Julian
On 24/03/14 19:14, Ryan C. Thompson wrote:
Hello, I have discovered a bug in the cdfDuplicates function in the les package. This function is used indirectly by the GSRI package, and I was attempting to use this package when I encountered an error. The error appears to occur because both rle and table are used to deduplicate a (sorted) vector, and these two functions apparently fail to use the same definition of equality for floating point values. This results in two different length vectors, which raises an error when they are passed to rep.int, which requires vectors of the same length. Replacing rle(pvalSort)$length with table(pvalSort) seems to solve the problem. I have compiled my test case into an RDS file that you can download and use to reproduce the bug: https://www.dropbox.com/s/k7k1m3s28aa4ajb/GSRI-les-cdfDuplicates-error-case.RDS This RDS file contains the full argument list that I pass to the "gsri" function to reproduce the error. Just download it, then execute the following R code: library(GSRI) do.call(gsri, readRDS("GSRI-les-cdfDuplicates-error-case.RDS")) After making the suggested change, this test case now works properly. The expression data is my own, and the gene set is MSigDB ID "AAAYRNCTG_UNKNOWN", with the gene IDs converted to my organism (cynomolgus monkey, whose genes are annotated with orthologous Ensembl Peptide IDs from human & rhesus). -Ryan
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140325/208a2333/attachment.pl>