Motivated by Deepayan's recent inquiries about the efficiency of the
R 'quantile'
function:
http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
http://tolstoy.newcastle.edu.au/R/devel/06/03/4358.html
I decided to try to revive an old project to implement a version of
the Floyd
and Rivest (1975) algorithm for finding quantiles with O(n)
comparisons. I
used to have a direct translation from FR's algol68 in my quantreg
package,
but was forced to remove it when it encountered g77 compilers that
didn't like the way I had handled recursion.
Fortunately, in the interim, I've corresponded with Krzysztof Kiwiel
in Warsaw
who has made a detailed study of the algorithm:
K.C. Kiwiel: On Floyd and Rivest's SELECT Algorithm, Theoretical
Computer Sci. 347 (2005) 214-238.
He has also kindly agreed to allow me to incorporate his
implementation of
the FR algorithm in R under GPL.
For the moment, I have made an alpha-test package with a single
function --
'kuantile' -- that attempts to reproduce the functionality of the
current
base quantile function, essentially replacing the partial sorting done
there with calls to Kiwiel's 'select' function. This package is
available
from:
http://www.econ.uiuc.edu/~roger/research/rq/kuantile
The good news is that the new function _is_ somewhat quicker than
the base 'quantile' function. The following table is based on means
of 10 replications. The first two columns report times for computing
just the median, the next two columns for computing the five default
quantiles: seq(0,1,.25), and the last column is the time needed for
a call
of sort().
mean system.time()[1] for various calls*
median only default 5
n quantile kuantile quantile kuantile sort
100 0.002 0.001 0.004 0.004 0.000
1000 0.002 0.001 0.003 0.002 0.002
10000 0.003 0.002 0.009 0.005 0.005
1e+05 0.024 0.010 0.061 0.013 0.031
1e+06 0.206 0.120 0.535 0.142 0.502
1e+07 2.132 0.790 5.575 1.035 7.484
* On a mac G5 running R 2.2.1.
The bad news is that this approach doesn't help with Deepayan's
original question which involved instances in which quantile() was
being called for rather large number of probabilities. In such
cases it seems that it is difficult to improve upon doing a full sort.
I would welcome comments on any of this....
url: www.econ.uiuc.edu/~roger Roger Koenker
email: rkoenker at uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820
Quicker quantiles?
2 messages · Roger Koenker, Brian Ripley
Please do the comparison with R-devel. That has a different policy, including switching to quicksort when partial has more than 10 elements. BTW you need to compare with quicksort (which suffices here and is quite a lot faster than shellsort for large numeric vectors). For 1 million+1 I am getting timings on my Windows laptop sort 0.39 quicksort 0.24 quantile(50%) 0.13 quantile() 0.17 kuantile(50%) 0.16 kuantile() 0.17 (timings for a million are about 50% more in the last four lines as roughly twice as many raw quantiles are needed.) Note that there is never more than about a factor of three gain over quicksort (remembering that quantile has some overhead of its own, so I also timed a version of quantile always using quicksort, using about 0.34). Given how rare this is as a task for R, it would probably be quite sufficient to always use quicksort.
On Sat, 11 Mar 2006, roger koenker wrote:
Motivated by Deepayan's recent inquiries about the efficiency of the
R 'quantile'
function:
http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
http://tolstoy.newcastle.edu.au/R/devel/06/03/4358.html
I decided to try to revive an old project to implement a version of
the Floyd
and Rivest (1975) algorithm for finding quantiles with O(n)
comparisons. I
used to have a direct translation from FR's algol68 in my quantreg
package,
but was forced to remove it when it encountered g77 compilers that
didn't like the way I had handled recursion.
Fortunately, in the interim, I've corresponded with Krzysztof Kiwiel
in Warsaw
who has made a detailed study of the algorithm:
K.C. Kiwiel: On Floyd and Rivest's SELECT Algorithm, Theoretical
Computer Sci. 347 (2005) 214-238.
He has also kindly agreed to allow me to incorporate his
implementation of
the FR algorithm in R under GPL.
For the moment, I have made an alpha-test package with a single
function --
'kuantile' -- that attempts to reproduce the functionality of the
current
base quantile function, essentially replacing the partial sorting done
there with calls to Kiwiel's 'select' function. This package is
available
from:
http://www.econ.uiuc.edu/~roger/research/rq/kuantile
The good news is that the new function _is_ somewhat quicker than
the base 'quantile' function. The following table is based on means
of 10 replications. The first two columns report times for computing
just the median, the next two columns for computing the five default
quantiles: seq(0,1,.25), and the last column is the time needed for
a call
of sort().
mean system.time()[1] for various calls*
median only default 5
n quantile kuantile quantile kuantile sort
100 0.002 0.001 0.004 0.004 0.000
1000 0.002 0.001 0.003 0.002 0.002
10000 0.003 0.002 0.009 0.005 0.005
1e+05 0.024 0.010 0.061 0.013 0.031
1e+06 0.206 0.120 0.535 0.142 0.502
1e+07 2.132 0.790 5.575 1.035 7.484
* On a mac G5 running R 2.2.1.
The bad news is that this approach doesn't help with Deepayan's
original question which involved instances in which quantile() was
being called for rather large number of probabilities. In such
cases it seems that it is difficult to improve upon doing a full sort.
I would welcome comments on any of this....
url: www.econ.uiuc.edu/~roger Roger Koenker
email: rkoenker at uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595