I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the 'snow' package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong's page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20111219/ab8d0b31/attachment.htm>
[Rcpp-devel] CppBugs vs WinBugs
10 messages · Watson, Samuel, Whit Armstrong
post your code. The speedup depends on your model. My comparisons were based on R/JAGS and pymc on linux. -Whit On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
for (i in 1:919){
log.radon[i] ~ dnorm (y.hat[i], tau.y)
y.hat[i] <- a[county2[i]] + b*floor[i]
}
b ~ dnorm (0, .0001)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 100)
for (j in 1:85){
a[j] ~ dnorm (mu.a, tau.a)
}
mu.a ~ dnorm (0, .0001)
tau.a <- pow(sigma.a, -2)
sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp>
#include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
const vec& level;
const vec& basement;
const mat& group;
int N, N_counties;
mat indicator_matrix; //indicator matrix used for county level random effects
Normal<vec> a;
Normal<double> b;
Deterministic<double> tau_y;
Uniform<double> sigma_y;
Normal<double> mu_a;
Deterministic<double> tau_a;
Uniform<double> sigma_a;
Deterministic<mat> y_hat;
Normal<mat> likelihood;
RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
level(level_),basement(basement_),group(group_),
a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
indicator_matrix(N,N_counties),
tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
{
indicator_matrix.fill(0.0);
for(int i=0;i<group.n_elem;i++){
indicator_matrix(i,group[i])=1.0;
}
add(a);
add(b);
add(tau_y);
add(sigma_y);
add(mu_a);
add(tau_a);
add(sigma_a);
add(y_hat);
add(likelihood);
}
void update() {
y_hat.value = indicator_matrix*a.value + b.value*basement;
tau_y.value = pow(sigma_y.value, -2.0);
tau_a.value = pow(sigma_a.value, -2.0);
a.dnorm(mu_a.value, tau_a.value);
b.dnorm(0, 0.0001);
sigma_y.dunif(0, 100);
mu_a.dnorm(0, 0.0001);
sigma_a.dunif(0, 100);
likelihood.dnorm(y_hat.value,tau_y.value);
}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR);
vec level = Rcpp::as<arma::vec>(lev);
vec basement = Rcpp::as<arma::vec>(basm);
int N = 919;
int N_counties = 85;
int iterations_ = as<int>(iterations);
int burn_ = as<int>(burn);
int adapt_ = as<int>(adapt);
int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()), Rcpp::Named("ar", m.acceptance_ratio()),
Rcpp::Named("a", m.a.mean()));
'
radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat", header=T, sep=",")
df.mn <- df[df[,2]=="MN",]
radon <- df.mn$activity
log.radon <- log (ifelse (radon==0, .1, radon))
floor <- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county)
uniq <- unique(county.name)
J <- length(uniq)
county <- rep (NA, J)
for (i in 1:J){
county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterations=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),sigma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bugs.directory="D:/R/WinBUGS14",n.thin=5))
-----------------------------------------------------------------------
The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on your model. My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-deve l
You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. I set it to 1000 instead of 2000. The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time ? ?16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time ?2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. CppBugs is still under rapid development, even though you don't see the commits on the public branch. A lot is changing on the backend, and I want to make it stable before it's released. The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp>
#include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level random effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR);
vec level = Rcpp::as<arma::vec>(lev);
vec basement = Rcpp::as<arma::vec>(basm);
int N = 919;
int N_counties = 85;
int iterations_ = as<int>(iterations);
int burn_ = as<int>(burn);
int adapt_ = as<int>(adapt);
int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()), Rcpp::Named("ar", m.acceptance_ratio()),
Rcpp::Named("a", m.a.mean()));
'
radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat", header=T, sep=",")
df.mn <- df[df[,2]=="MN",]
radon <- df.mn$activity
log.radon <- log (ifelse (radon==0, .1, radon))
floor <- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county)
uniq <- unique(county.name)
J <- length(uniq)
county <- rep (NA, J)
for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterations=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),sigma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bugs.directory="D:/R/WinBUGS14",n.thin=5))
-----------------------------------------------------------------------
The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-deve l
Hi Whit, Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself. I amended the mistakes you mentioned, for example I have included the line group -= 1; in the run.model.cpp function. Here are the results with your suggested changes: system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model.file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.thin=5)) user system elapsed 11.03 0.00 11.12 system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,iterations=1e4L,burn=1e4L,adapt=1e3L,thin=5L)) user system elapsed 10.86 0.00 10.86 If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. Would you have any other recommendations of things that could be slowing the model down? I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM Many thanks, Sam Watson -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 15:23 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. I set it to 1000 instead of 2000. The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time ? ?16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time ?2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. CppBugs is still under rapid development, even though you don't see the commits on the public branch. A lot is changing on the backend, and I want to make it stable before it's released. The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit
On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp>
#include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level random
effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR);
vec level = Rcpp::as<arma::vec>(lev);
vec basement = Rcpp::as<arma::vec>(basm); int N = 919; int N_counties
= 85; int iterations_ = as<int>(iterations); int burn_ =
as<int>(burn); int adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
m.a.mean())); '
radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt
="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rado
n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon <-
df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon)) floor
<- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
<- length(uniq) county <- rep (NA, J) for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterati
ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else
winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),s
igma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bug
s.directory="D:/R/WinBUGS14",n.thin=5))
----------------------------------------------------------------------
- The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-dev e l
Not sure. You might have a look at what compiler options are used for the inline cpp function. If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s. -Whit On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
Hi Whit, Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself. I amended the mistakes you mentioned, for example I have included the line group -= 1; in the run.model.cpp function. Here are the results with your suggested changes: system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model.file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.thin=5)) ? user ?system elapsed ?11.03 ? ?0.00 ? 11.12 system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,iterations=1e4L,burn=1e4L,adapt=1e3L,thin=5L)) ? user ?system elapsed ?10.86 ? ?0.00 ? 10.86 If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. Would you have any other recommendations of things that could be slowing the model down? I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM Many thanks, Sam Watson -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 15:23 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. ?I set it to 1000 instead of 2000. ?The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time ? ?16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time ?2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. ?The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. ?CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). ?Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. ?CppBugs is still under rapid development, even though you don't see the commits on the public branch. ?A lot is changing on the backend, and I want to make it stable before it's released. ?The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp>
#include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level random
effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR);
vec level = Rcpp::as<arma::vec>(lev);
vec basement = Rcpp::as<arma::vec>(basm); int N = 919; int N_counties
= 85; int iterations_ = as<int>(iterations); int burn_ =
as<int>(burn); int adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
m.a.mean())); '
radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt
="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rado
n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon <-
df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon)) floor
<- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
<- length(uniq) county <- rep (NA, J) for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterati
ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else
winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),s
igma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bug
s.directory="D:/R/WinBUGS14",n.thin=5))
----------------------------------------------------------------------
- The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-dev e l
I changed the options for cxxfunction() in R using:
settings=getPlugin("RcppArmadillo")
settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep=' ')
radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F,settings=settings)
This didn?t make any difference at all.
For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
I will test some more models and hopefully this model is just an anomaly for me.
Thanks for a great package!
Sam
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 16:15
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
Not sure.
You might have a look at what compiler options are used for the inline cpp function.
If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.
-Whit
On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
Hi Whit, Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself. I amended the mistakes you mentioned, for example I have included the line group -= 1; in the run.model.cpp function. Here are the results with your suggested changes: system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model. file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.th in=5)) ? user ?system elapsed ?11.03 ? ?0.00 ? 11.12 system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,it erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L)) ? user ?system elapsed ?10.86 ? ?0.00 ? 10.86 If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. Would you have any other recommendations of things that could be slowing the model down? I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM Many thanks, Sam Watson -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 15:23 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. ?I set it to 1000 instead of 2000. ?The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time ? ? 16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time ? 2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. ?The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. ?CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). ?Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. ?CppBugs is still under rapid development, even though you don't see the commits on the public branch. ?A lot is changing on the backend, and I want to make it stable before it's released. ?The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level
random effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR);
vec level = Rcpp::as<arma::vec>(lev); vec basement =
Rcpp::as<arma::vec>(basm); int N = 919; int N_counties = 85; int
iterations_ = as<int>(iterations); int burn_ = as<int>(burn); int
adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
m.a.mean())); '
radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",adap
t ="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rad
o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon
<- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon))
floor
<- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
<- length(uniq) county <- rep (NA, J) for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterat
i
ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else
winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),
s
igma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bu
g
s.directory="D:/R/WinBUGS14",n.thin=5))
---------------------------------------------------------------------
-
- The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-de v e l
Oh, that's great. I thought you said the linear model was only showing 2x in your first email. Glad you are finding it useful. Eventually, one should be able to specify the model directly in R, which should make a huge difference in ease of use. But I'm not quite there yet. I have more backend changes to make before I can get to the user interface improvements. Please let me know if you find bugs. Or file them here: https://github.com/armstrtw/CppBugs/issues?sort=created&direction=asc&_pjax=true&state=open Cheers, Whit On Mon, Dec 19, 2011 at 1:14 PM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
I changed the options for cxxfunction() in R using:
?settings=getPlugin("RcppArmadillo")
?settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep=' ')
?radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F,settings=settings)
This didn?t make any difference at all.
For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
I will test some more models and hopefully this model is just an anomaly for me.
Thanks for a great package!
Sam
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 16:15
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
Not sure.
You might have a look at what compiler options are used for the inline cpp function.
If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.
-Whit
On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
Hi Whit, Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself. I amended the mistakes you mentioned, for example I have included the line group -= 1; in the run.model.cpp function. Here are the results with your suggested changes: system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model. file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.th in=5)) ? user ?system elapsed ?11.03 ? ?0.00 ? 11.12 system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,it erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L)) ? user ?system elapsed ?10.86 ? ?0.00 ? 10.86 If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. Would you have any other recommendations of things that could be slowing the model down? I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM Many thanks, Sam Watson -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 15:23 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. ?I set it to 1000 instead of 2000. ?The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time 16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time 2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. ?The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. ?CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). ?Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. ?CppBugs is still under rapid development, even though you don't see the commits on the public branch. ?A lot is changing on the backend, and I want to make it stable before it's released. ?The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level
random effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR);
vec level = Rcpp::as<arma::vec>(lev); vec basement =
Rcpp::as<arma::vec>(basm); int N = 919; int N_counties = 85; int
iterations_ = as<int>(iterations); int burn_ = as<int>(burn); int
adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
m.a.mean())); '
radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",adap
t ="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rad
o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon
<- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon))
floor
<- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
<- length(uniq) county <- rep (NA, J) for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterat
i
ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else
winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),
s
igma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bu
g
s.directory="D:/R/WinBUGS14",n.thin=5))
---------------------------------------------------------------------
-
- The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-de v e l
the 2x was before I changed the adapt parameter as per your advice. I am getting the cppbugs to return the whole history of iterations list at the moment, from different chains running in parallel using the 'snow' package in R, as I need to perform convergence diagnostics on the chains, and keep running if necessary, something I know only how to do in R. If its of interest I can post any code I get from this too Are you planning on developing a way of specifying initial values for each Markov chain? Sam -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 18:24 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs Oh, that's great. I thought you said the linear model was only showing 2x in your first email. Glad you are finding it useful. Eventually, one should be able to specify the model directly in R, which should make a huge difference in ease of use. But I'm not quite there yet. I have more backend changes to make before I can get to the user interface improvements. Please let me know if you find bugs. Or file them here: https://github.com/armstrtw/CppBugs/issues?sort=created&direction=asc&_pjax=true&state=open Cheers, Whit
On Mon, Dec 19, 2011 at 1:14 PM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I changed the options for cxxfunction() in R using:
?settings=getPlugin("RcppArmadillo")
?settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep='
')
?radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt
="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F,settings=settings)
This didn?t make any difference at all.
For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
I will test some more models and hopefully this model is just an anomaly for me.
Thanks for a great package!
Sam
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 16:15
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
Not sure.
You might have a look at what compiler options are used for the inline cpp function.
If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.
-Whit
On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
Hi Whit, Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself. I amended the mistakes you mentioned, for example I have included the line group -= 1; in the run.model.cpp function. Here are the results with your suggested changes: system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model. file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.t h in=5)) ? user ?system elapsed ?11.03 ? ?0.00 ? 11.12 system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,i t erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L)) ? user ?system elapsed ?10.86 ? ?0.00 ? 10.86 If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. Would you have any other recommendations of things that could be slowing the model down? I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM Many thanks, Sam Watson -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 15:23 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. ?I set it to 1000 instead of 2000. ?The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time 16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time 2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. ?The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. ?CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). ?Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. ?CppBugs is still under rapid development, even though you don't see the commits on the public branch. ?A lot is changing on the backend, and I want to make it stable before it's released. ?The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level
random effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR); vec level =
Rcpp::as<arma::vec>(lev); vec basement = Rcpp::as<arma::vec>(basm);
int N = 919; int N_counties = 85; int iterations_ =
as<int>(iterations); int burn_ = as<int>(burn); int adapt_ =
as<int>(adapt); int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
m.a.mean())); '
radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",ada
p t ="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/ra
d o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",]
radon
<- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon))
floor
<- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
<- length(uniq) county <- rep (NA, J) for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,itera
t
i
ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else
winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1)
,
s
igma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),b
u
g
s.directory="D:/R/WinBUGS14",n.thin=5))
--------------------------------------------------------------------
-
-
- The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-d e v e l
If its of interest I can post any code I get from this too
yes, please.
Are you planning on developing a way of specifying initial values for each Markov chain?
Well, you can do that in the constructor. Each stochastic object
takes an argument of the underlying type as the initial value.
For most of my examples, I simply init to randn<mat>(nrow,ncol), but
you can use explicit values as well.
For multiple chains, you can call sample again, but cppbugs will
continue to use the same vector as storage. So, you can either copy
and clear the vector after each run, or simply use the iterations/thin
to determine the boundary between separate chains.
For the dev version of cppbugs, you will not need to write a model
object, so you can simply initialize all the arma objects to the
values you want.
Here is quick example of what the linear model will look like under
the new setup:
Normal<vec> b(randn<vec>(2)); // the initial value is: randn<vec>(2)
Deterministic<mat> y_hat;
Deterministic<double> rsq;
Uniform<double> tau_y(1); // initial value is 1.0
Normal<mat> likelihood(y,true);
b.dnorm(0.0, 0.0001);
tau_y.dunif(0.,100.);
likelihood.dnorm(y_hat,tau_y);
// model now declared as a closure (std::function):
std::function<void ()> model = [&]() {
y_hat = X * b.value;
rsq = as_scalar(1 - var(y - y_hat.value) / var(y));
};
// model declaration still needs a list of the mcmc objects
MCModel m({&b, &y_hat, &tau_y, &likelihood, &rsq},model);
int iterations = 1e5;
m.sample(iterations, 1e4, 1e4, 10);
-Whit
On Mon, Dec 19, 2011 at 1:33 PM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
the 2x was before I changed the adapt parameter as per your advice. I am getting the cppbugs to return the whole history of iterations list at the moment, from different chains running in parallel using the 'snow' package in R, as I need to perform convergence diagnostics on the chains, and keep running if necessary, something I know only how to do in R. If its of interest I can post any code I get from this too Are you planning on developing a way of specifying initial values for each Markov chain? Sam -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 18:24 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs Oh, that's great. ?I thought you said the linear model was only showing 2x in your first email. Glad you are finding it useful. ?Eventually, one should be able to specify the model directly in R, which should make a huge difference in ease of use. ?But I'm not quite there yet. ?I have more backend changes to make before I can get to the user interface improvements. Please let me know if you find bugs. ?Or file them here: https://github.com/armstrtw/CppBugs/issues?sort=created&direction=asc&_pjax=true&state=open Cheers, Whit On Mon, Dec 19, 2011 at 1:14 PM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I changed the options for cxxfunction() in R using:
?settings=getPlugin("RcppArmadillo")
?settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep='
')
?radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt
="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F,settings=settings)
This didn?t make any difference at all.
For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
I will test some more models and hopefully this model is just an anomaly for me.
Thanks for a great package!
Sam
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 16:15
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
Not sure.
You might have a look at what compiler options are used for the inline cpp function.
If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.
-Whit
On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
Hi Whit, Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself. I amended the mistakes you mentioned, for example I have included the line group -= 1; in the run.model.cpp function. Here are the results with your suggested changes: system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model. file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.t h in=5)) ? user ?system elapsed ?11.03 ? ?0.00 ? 11.12 system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,i t erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L)) ? user ?system elapsed ?10.86 ? ?0.00 ? 10.86 If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. Would you have any other recommendations of things that could be slowing the model down? I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM Many thanks, Sam Watson -----Original Message----- From: Whit Armstrong [mailto:armstrong.whit at gmail.com] Sent: 19 December 2011 15:23 To: Watson, Samuel Cc: rcpp-devel at r-forge.wu-wien.ac.at Subject: Re: [Rcpp-devel] CppBugs vs WinBugs You have the model specified wrong. in your wrapper function you call:
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
However, the constructor requires the 'level' variable to be in the first position and group in the third position :
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. ?So, had it been in the correct position, you would have corrupted memory when you ran the program. One additional thing I changed was the 'adapt' variable. ?I set it to 1000 instead of 2000. ?The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio. In any case, after fixing these issues, these are the results I get (using jags instead of winbugs): ? ? ? ? ? ?user.self sys.self ? elapsed user.child sys.child jags.time 16.461000 ? ?0.064 16.586000 ? ? ? ? ?0 ? ? ? ? 0 cppbugs.time 2.416000 ? ?0.000 ?2.420000 ? ? ? ? ?0 ? ? ? ? 0 ? ? ? ? ? ? ?6.813328 ? ? ?Inf ?6.853719 ? ? ? ?NaN ? ? ? NaN
Which I think it more typical of the speedup I would expect. ?The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup. I still think you could see a 20x speedup on a large dataset. ?CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). ?Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case. I'll post this revised code up to my github, so you can see the changes I made. One last thing. ?CppBugs is still under rapid development, even though you don't see the commits on the public branch. ?A lot is changing on the backend, and I want to make it stable before it's released. ?The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class. -Whit On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
The radon model is as follows (I haven?t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
BUGS code (to be saved in WinBugs working directory as "radon.bug"):
model {
?for (i in 1:919){
? ?log.radon[i] ~ dnorm (y.hat[i], tau.y)
? ?y.hat[i] <- a[county2[i]] + b*floor[i]
?}
?b ~ dnorm (0, .0001)
?tau.y <- pow(sigma.y, -2)
?sigma.y ~ dunif (0, 100)
?for (j in 1:85){
? ?a[j] ~ dnorm (mu.a, tau.a)
?}
?mu.a ~ dnorm (0, .0001)
?tau.a <- pow(sigma.a, -2)
?sigma.a ~ dunif (0, 100)
}
-------------------------------------
R code:
require(inline)
require(Rcpp)
require(R2WinBUGS)
model<-'
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include <cmath>
#include <armadillo>
#include <boost/random.hpp>
#include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
using namespace arma;
using namespace cppbugs;
using std::vector;
using std::string;
using std::cout;
using std::endl;
using std::ifstream;
class RadonVaryingInterceptModel: public MCModel {
public:
?const vec& level;
?const vec& basement;
?const mat& group;
?int N, N_counties;
?mat indicator_matrix; //indicator matrix used for county level
random effects
?Normal<vec> a;
?Normal<double> b;
?Deterministic<double> tau_y;
?Uniform<double> sigma_y;
?Normal<double> mu_a;
?Deterministic<double> tau_a;
?Uniform<double> sigma_a;
?Deterministic<mat> y_hat;
?Normal<mat> likelihood;
?RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
? ?level(level_),basement(basement_),group(group_),
? ?a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
? ?indicator_matrix(N,N_counties),
? ?tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
? ?y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
?{
? ?indicator_matrix.fill(0.0);
? ?for(int i=0;i<group.n_elem;i++){
? ?indicator_matrix(i,group[i])=1.0;
? ?}
? ?add(a);
? ?add(b);
? ?add(tau_y);
? ?add(sigma_y);
? ?add(mu_a);
? ?add(tau_a);
? ?add(sigma_a);
? ?add(y_hat);
? ?add(likelihood);
?}
?void update() {
? ?y_hat.value = indicator_matrix*a.value + b.value*basement;
? ?tau_y.value = pow(sigma_y.value, -2.0);
? ?tau_a.value = pow(sigma_a.value, -2.0);
? ?a.dnorm(mu_a.value, tau_a.value);
? ?b.dnorm(0, 0.0001);
? ?sigma_y.dunif(0, 100);
? ?mu_a.dnorm(0, 0.0001);
? ?sigma_a.dunif(0, 100);
? ?likelihood.dnorm(y_hat.value,tau_y.value);
?}
};
'
Src<-'
mat group = Rcpp::as<arma::mat>(GR); vec level =
Rcpp::as<arma::vec>(lev); vec basement = Rcpp::as<arma::vec>(basm);
int N = 919; int N_counties = 85; int iterations_ =
as<int>(iterations); int burn_ = as<int>(burn); int adapt_ =
as<int>(adapt); int thin_ = as<int>(thin);
RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
m.sample(iterations_, burn_, adapt_, thin_);
return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
m.a.mean())); '
radon.model <- cxxfunction(signature(GR="numeric",
lev="numeric",basm="numeric",iterations="integer",burn="integer",ada
p t ="adapt",thin="integer"), body=src, include=model,
plugin="RcppArmadillo",verbose=F)
df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/ra
d o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",]
radon
<- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon))
floor
<- df.mn$floor
# get county index variable
county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
<- length(uniq) county <- rep (NA, J) for (i in 1:J){
?county[county.name==uniq[i]] <- i
}
system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,itera
t
i
ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
#its necessary to change the structure of the data files or else
winbugs has an error
log.radon<-as.numeric(log.radon)
floor<-as.numeric(floor)
county<-as.numeric(county)
radon.data<-list(log.radon=log.radon,floor=floor,county=county)
radon.param<-c("a","b")
radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1)
,
s
igma.y=runif(1),mu.a=rnorm(1)}
#n.burnin = n.iter/2 for bugs
system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),b
u
g
s.directory="D:/R/WinBUGS14",n.thin=5))
--------------------------------------------------------------------
-
-
- The difference I get is 16.53s vs 13.85s
Kind regards,
Sam Watson
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 12:39
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
post your code.
The speedup depends on ?your model. ?My comparisons were based on R/JAGS and pymc on linux.
-Whit
On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
I am currently using CppBugs with Rcpp through R. I am very interested to use CppBugs as I am finding WinBugs to be prohibitively slow, I use large amounts of data in large multilevel models, so when I found cppbugs I was excited. It says on the Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs. I have been testing some examples: for the linear model example that is provided at Github, the time difference is about 2x I also tested the Radon model with Rcpp and Andrew Gelmans data and for 10000 iterations and 10000 burnin with a thinning parameter of 5 the difference is only 16 seconds vs 13 seconds. I can run multiple chains in parallel through R with the ?snow? package, cxxfunction() and package.skeleton() and this does seem to provide a little bit of a boost (parallel winbugs vs parallel cppbugs). But nothing greater than 4x. Is it possible to achieve the kind of difference mentioned on Whit Armstrong?s page for cppbugs with R? I am happy to post all the code if required. Many thanks Sam Watson
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-d e v e l