Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below). As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours. While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages? And if so, which one? I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R..... Any thoughts/suggestions? Best wishes, Alan Kelly sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] car_2.0-9 survival_2.36-2 nnet_7.3-1 MASS_7.3-9 foreign_0.8-41 loaded via a namespace (and not attached): [1] tools_2.12.1
would parallel computing help?
5 messages · Alan Kelly, Ken Beath, Brian Ripley +2 more
It would be easier to run multiple copies of R. Ken
On 08/03/2011, at 7:14 PM, Alan Kelly wrote:
Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below). As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours. While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages? And if so, which one? I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R..... Any thoughts/suggestions? Best wishes, Alan Kelly sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] car_2.0-9 survival_2.36-2 nnet_7.3-1 MASS_7.3-9 foreign_0.8-41 loaded via a namespace (and not attached): [1] tools_2.12.1
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
This is an example of 'embarrassingly parallel' computation. Simply run each chain in a separate process in parallel. Packages such as snow or multicore can organize that for you. However, if you mean logistic regression (there are other binomial regressions such as probit), first check how you are doing this in JAGS. Using 'module glm' often makes a large difference in speed, and my recollection is that this is still not particularly fast compared to, say, MCMCpack. And in any case the recommended way to run JAGS with R is rjags (recommended by the author of JAGS, amonst others).
On Tue, 8 Mar 2011, Alan Kelly wrote:
Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below). As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours. While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages? And if so, which one? I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R.....
Again, if you mean logistic regression there are specialised MCMC schemes.
Any thoughts/suggestions? Best wishes, Alan Kelly sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] car_2.0-9 survival_2.36-2 nnet_7.3-1 MASS_7.3-9 foreign_0.8-41 loaded via a namespace (and not attached): [1] tools_2.12.1
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Alan, The multicore package is easy to use and, if you problem is indeed embarrassingly parallel (there's no communication between different models? how about between chains?) should be straightforward to add. Note that you'll need to run any multicore-using script from the Terminal command line, and not from the Mac GUI, though. Ben
On Tue, Mar 8, 2011 at 3:14 AM, Alan Kelly <AKELLY at tcd.ie> wrote:
Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below). ?As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours. ?While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages? ?And if so, which one? ?I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R..... Any thoughts/suggestions? Best wishes, Alan Kelly sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] splines ? stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] car_2.0-9 ? ? ? survival_2.36-2 nnet_7.3-1 ? ? ?MASS_7.3-9 ? ? ?foreign_0.8-41 loaded via a namespace (and not attached): [1] tools_2.12.1
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
On Mar 8, 2011, at 5:45 AM, Ben Bond-Lamberty wrote:
Alan, The multicore package is easy to use and, if you problem is indeed embarrassingly parallel (there's no communication between different models? how about between chains?) should be straightforward to add. Note that you'll need to run any multicore-using script from the Terminal command line, and not from the Mac GUI, though.
FWIW since multicore 0.1-4 and R 2.12.2 it should be possible to run multicore in the Mac GUI (as long as you don't explicitly call GUI or graphics code in the parallel parts). Cheers, Simon
Ben On Tue, Mar 8, 2011 at 3:14 AM, Alan Kelly <AKELLY at tcd.ie> wrote:
Dear all, I'm running a number of Bayesian binomial regression models using jags (interfacing with R via R2jags) on a Mac server with quad core processor running at 2.66 Ghz with 6 GB memory under Snow Leopard (session info below). As the models contain around 30 predictors and between 5 to 15 thousand observations, the time required to run a single model with 3 chains with an adequate number of iterations to ensure convergence is around 2 hours. While I can live with this for the occasional run, it will be a problem when I need to run several dozen different models. Perhaps some of you have relevant experience and can advise if this run time could be significantly reduced using, for example, one of the parallel computing packages? And if so, which one? I should add that I'm not clear if jags can directly avail of multicore processing even if available - it might be necessary to program a Gibbs or Metropolis sampler directly in R..... Any thoughts/suggestions? Best wishes, Alan Kelly sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] car_2.0-9 survival_2.36-2 nnet_7.3-1 MASS_7.3-9 foreign_0.8-41 loaded via a namespace (and not attached): [1] tools_2.12.1
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac