Skip to content

would parallel computing help?

5 messages · Alan Kelly, Ken Beath, Brian Ripley +2 more

#
Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below).  As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours.  While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. 
Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages?  And if so, which one?  I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R.....
Any thoughts/suggestions?
Best wishes,
Alan Kelly

sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] car_2.0-9       survival_2.36-2 nnet_7.3-1      MASS_7.3-9      foreign_0.8-41 

loaded via a namespace (and not attached):
[1] tools_2.12.1
#
It would be easier to run multiple copies of R.

Ken
On 08/03/2011, at 7:14 PM, Alan Kelly wrote:

            
#
This is an example of 'embarrassingly parallel' computation.  Simply 
run each chain in a separate process in parallel.  Packages such as 
snow or multicore can organize that for you.

However, if you mean logistic regression (there are other binomial 
regressions such as probit), first check how you are doing this in 
JAGS.  Using 'module glm' often makes a large difference in speed, and 
my recollection is that this is still not particularly fast compared 
to, say, MCMCpack. And in any case the recommended way to run JAGS 
with R is rjags (recommended by the author of JAGS, amonst others).
On Tue, 8 Mar 2011, Alan Kelly wrote:

            
Again, if you mean logistic regression there are specialised MCMC 
schemes.

  
    
#
Alan,
The multicore package is easy to use and, if you problem is indeed
embarrassingly parallel (there's no communication between different
models? how about between chains?) should be straightforward to add.
Note that you'll need to run any multicore-using script from the
Terminal command line, and not from the Mac GUI, though.
Ben
On Tue, Mar 8, 2011 at 3:14 AM, Alan Kelly <AKELLY at tcd.ie> wrote:
#
On Mar 8, 2011, at 5:45 AM, Ben Bond-Lamberty wrote:

            
FWIW since multicore 0.1-4 and R 2.12.2 it should be possible to run multicore in the Mac GUI (as long as you don't explicitly call GUI or graphics code in the parallel parts).

Cheers,
Simon