Skip to content
Prev 62142 / 63424 Next

cwilcox - new version

I?ve been looking at this for a couple hours--it ended up being trickier than I expected to implement well.

I?ve attached a new patch here. This version scales significantly better than the existing method for both `pwilcox` and `dwilcox`. Examples are included below.

I can?t think of any other ways to improve runtime at this point. That?s not to say there aren?t any, though?I?m hopeful inspection by someone else could identify more ways to save some time.

This version uses Andreas?s algorithm for computing the value of `cwilcox`, but evaluates it lazily and caches results. Memory usage of this should be orders of magnitude less than the current version, and recursion is eliminated. As far as I can tell, the numerical accuracy of the results is unchanged.

Performance statistics are interesting. If we assume the two populations have a total of `m` members, then this implementation runs slightly slower for m < 20, and much slower for 50 < m < 100. However, this implementation works significantly *faster* for m > 200. The breakpoint is precisely when each population has a size of 50; `qwilcox(0.5,50,50)` runs in 8 microseconds in the current version, but `qwilcox(0.5, 50, 51)` runs in 5 milliseconds. The new version runs in roughly 1 millisecond for both. This is probably because of internal logic that requires many more `free/calloc` calls if either population is larger than `WILCOX_MAX`, which is set to 50.

I?m hopeful that this can be optimized to be suitable for inclusion in R. Lower performance for population sizes below 50 is not ideal, since `wilcox.test` switches to non-exact testing for population sizes above 50.

-Aidan

Benchmarking results on my machine using `microbenchmark`:
(median runtimes over 100 iterations, us=microseconds, ms=milliseconds, s=seconds)

`qwilcox(0.5, n, n)`

              n:     10      25     50     100     200
    old version:  1.2us   2.9us  9.0us  87.4ms    2.3s
Andreas version:  2.7us  68.6us  1.1ms  16.9ms 264.3ms

`dwilcox((n*n)%/%2, n, n)`
              n:     10      25     50     100     200
    old version:  1.4us   0.9us  0.9us  43.2ms 851.6ms
Andreas version:  2.3us  53.9us  1.0ms  16.4ms 261.7ms


`pwilcox(1:100, 10, 10)`:
    old version:  62.9us
Andreas version:  22.9us


-----------------------
Aidan Lakshman (he/him)
PhD Candidate, Wright Lab
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
http://www.ahl27.com/
ahl27 at pitt.edu | (724) 612-9940
On 16 Jan 2024, at 9:47, Aidan Lakshman wrote:

            
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wilcox_patch_draft_v2.gz
Type: application/x-gzip
Size: 2652 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20240116/51c0bd7c/attachment.bin>