Hi Mark:
I had also used sample and tabulate for generating multinomial and found it
to be quite slow. So I had written a multinomial random numbers generator
based on the GENMUL subroutine from "ranlib", which in turn is based on the
algorithm from Luc Devroye's book on "Non-Uniform Random Variate Generation"
You may want to compare this with your hybrid algorithm and see their
relative performance.
Best,
Ravi.
###############################################################
## A multinomial random vector generator
rmultinom <- function(n, p){
ncat <- length(p)
# Check Arguments
if (n < 0) {cat("n < 0 ","\n"); break}
if (ncat <= 1) {cat("ncat <= 1 ","\n"); break}
if (any(p < 0.0)) {cat("Some P(i) < 0 ","\n"); break}
if (any(p > 1.0)) {cat("Some P(i) > 1.0 ","\n"); break}
eps <- .Machine$double.eps^0.9
if (sum(p) > (1.0 + eps) | sum(p) < (1.0 - eps) ) {cat("Sum of P(i)
should equal 1.0 ","\n"); break}
# Initialize variables
ntot <- n
sum <- 1.0
ix <- rep(0,ncat)
# Generate the observation
for (icat in 1:(ncat - 1)) {
prob <- p[icat]/sum
ix[icat] <- rbinom(1,ntot,prob)
ntot <- ntot - ix[icat]
if (ntot <= 0) return(ix)
sum <- sum - p[icat]
}
ix[ncat] <- ntot
return (ix)
}
-----Original Message-----
From: Mark.Bravington at csiro.au <Mark.Bravington at csiro.au>
To: r-help at stat.math.ethz.ch <r-help at stat.math.ethz.ch>
Date: Wednesday, August 07, 2002 11:16 PM
Subject: [R] RE: rmultinom
Dear newsgroup, There was a recent post suggesting the incorporation of a standard rmultinom(...). This seems like a good idea, but I wasn't sure about basing this on tabulate( sample( ...)). Despite the attractive succinctness, this could be very slow and use lots of memory if n or size is large. Instead, I've tended to use a loop over the boxes of the multinomial, taking successive binomial samples from the remaining available observations each time. To check, I did some timing comparisons. Unsurprisingly, the tabulate() version is much slower for large "size" (number of observations per box). However, the successive binomial version is slower when observations are sparse, presumably because it spends most of its time filling in zeros that tabulate() doesn't worry about. test> # Low dimension, many observations per box test> length( ranvec) [1] 7 test> system.time( sucbin.rmultinom( 1000, 1000, ranvec)) [1] 0.02 0.00 0.02 NA NA test> system.time( tab.rmultinom( 1000, 1000, ranvec)) [1] 0.19 0.00 0.18 NA NA test> # High dimension, few observations per box test> length( ranvec.long) [1] 1000 test> system.time( sucbin.rmultinom( 10000, 10, ranvec.long)) [1] 18.42 0.06 18.54 NA NA test> system.time( tab.rmultinom( 10000, 10, ranvec.long)) [1] 0.97 0.10 1.06 NA NA A good option for standard use might be a hybrid, with a "method" parameter saying which approach to use. Verrry roughly, this might default to "use successive binomials if size > length( prob)". The optimal cutoff seems to have a more complex dependence on n, size, and p, but this default choice should avoid the worst problems. As a suggestion, I've included code for a hybrid. Underscore-haters may
wish
to stop reading here :)
# first, two standard functions of mine: these could go inside or be coded
explicitly, of course
# Avoid unintended behaviour with : when 2nd arg is lower than 1st
assign( '%upto%', function( from, to) if( from <= to) from:to else numeric(
0))
# Trim tail of vector
clip_ function( x, n=1) x[ 1 %upto% ( length( x) - n)]
rmultinom_ function( n, size, prob, method=if(k>size) "tabulate" else
"sucbin") {
# returns a n X length( prob) matrix of integers, each row adding up to
size
k_ length( prob)
if( pmatch( method, c( 'sucbin', 'tabulate')) == 1) { # successive
binomials
prob_ prob / sum( prob)
prob[-1]_ prob[-1] / (1-clip( cumsum( prob)))
x_ matrix( as.integer( 0), n, k)
nn_ rep( size, n)
for( i in 1 %upto% (k-1)) {
x[,i]_ as.integer( rbinom( n, size=nn, prob=prob[ i]))
nn_ nn - x[,i]
}
x[,k]_ nn
} else # tabulation
x_ matrix( tabulate( sample( k, n * size, repl = TRUE, prob) + k * 0
%upto% (n - 1), nbins = n * k),
nrow = n, ncol = k, byrow = TRUE) # NB using ":" instead of
"%upto%"
would fail for n=1 return( x) } cheers Mark ******************************* Mark Bravington CSIRO (CMIS) PO Box 1538 Castray Esplanade Hobart TAS 7001 phone (61) 3 6232 5118 fax (61) 3 6232 5012 Mark.Bravington at csiro.au -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
._._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._