efficiency of sample() with prob.
You might recall the message
R is a collaborative project with many contributors.
We suggest that you take up your own suggestion, research this area and
offer the R project the results of your research for consideration as your
contribution.
It seems likely that in sample(x, size, replace = TRUE, prob) the optimal
method depends on both the size of 'x' and 'size' and perhaps to a lesser
extent on 'prob'. (That's what my book on the subject shows.)
On Wed, 22 Jun 2005, Bo Peng wrote:
On 6/21/05, Vadim Ogranovich <vograno at evafunds.com> wrote:
In his "Introduction to Probability Models" Sheldon Ross describes (sec 11.4.1, 8th edition) the alias method for such weighted sampling. It is based on some decomposition of the original distribution (the weights) into a mixture of two-point distributions.
This sounds like Walker's alias method for weighted sampling. I looked through Knoth's 'the art of computer programming' and find this algorithm. I implemented this one but it is just as efficient as the bisection lookup method in my case. The reason is that the setup of this algorithm is complicated so it is suited for getting large sample from short weighted sequences. Anyway, I do suggest R developers try this algorithm for sample with replacement. A sample code can be found at http://statistik.wu-wien.ac.at/arvag/monograph/arvag-src/algo03_03.c . BTW, does anyone know a quicker algorithm to set up the internal table of the alias method?
Quicker than what? See the discussion in my Stochastic Simulation book for `quicker than Walker'.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595