My surprising experience in trying out REvolution's R

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090421/37f12f55/attachment-0001.pl>
Jason:

Thanks for posting this ... interesting.

A guess: Depends on the problem, the hardware, the matrix libraries,...

e.g. in relatively small problems, Revolution's overhead may consume more
time and resources than the problem warrants. In others, you may see many
fold improvements. Very dangerous to generalize from an example or two (as I
recently experienced to my own chagrin).

Cheers,
Bert 

Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Jason Liao
Sent: Tuesday, April 21, 2009 7:26 AM
To: r-help at r-project.org
Subject: [R] My surprising experience in trying out REvolution's R

I care a lot about R's speed. So I decided to give REvolution's R
(http://revolution-computing.com/) a try, which bills itself as an
optimized R. Note that I used the free version.

My machine is a Intel core 2 duo under Windows XP professional. The code
I run is in the end of this post.

First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage
50%

Next, REvolution's R. It takes 2 minutes and 10 seconds, CPU usage 100%.

In other words, REvolution's R consumes double the CPU with slightly
less speed.

The above has been replicated a few times (as a statistician of course).

Anyone has any insight on this? Anyway, my high hope was dashed.

  rm(list=ls(all=TRUE))

  library(MASS);

 ###small and common functions##################################

  glmm.sample.y <- function(b, D.u, x, z)

  {

    pp <- matrix(0, nrow = n, ncol = m);

    zero <- numeric(n.random);

    ran <- t( mvrnorm(m, zero, D.u) );

    for(j in 1:m) pp[, j] <- as.vector( x[, , j] %*% b + z[, , j] %*%
ran[ ,j] );

    pp <- exp(pp)/( 1+exp(pp) );

    y <- runif(m*n);

    ifelse(y<pp, 1, 0);

  }

  #################

   quadratic.form <- function(A, x) { return( as.vector(x %*% A %*% x) )
}

   quadratic.form2 <- function(A, x)

   {

      x <- chol(A) %*% x;

      colSums(x*x)

    }

 #######################################################################

  REML.b.D.u <- function(b, D.u, x, y, z, n.iter)

  {

     u.mean.initial <- array( 0, c(n.random, m) );

     TT <- matrix(0, n.random, n.random);

     for(i in 1:n.iter)

     {

        TT <- TT*(1-1/i) + bias(b, D.u, x, z)/i;

        obj <- sample.u(b, D.u, x, y, z, u.mean.initial);

        b <- b - solve(obj$Hessian, obj$score);

        D.u <- obj$uu + TT;

        u.mean.initial <- obj$u.mean.initial;

        print(i); print(date());

        print("inside REML"); print(TT);

        if(i==n.iter)  write(TT, file="c:/liao/reml/simu50_8.dat",
append=T);

        print(D.u);

     }

      list(b=b, D.u=D.u);

  }

   bias <- function(b, D.u, x, z)

   {

      yy <- glmm.sample.y(b, D.u, x, z);  #this is the sampling stage

      obj1 <- mle(b, D.u, x, yy, z, F, F, n.iter=50);

      obj2 <- mle(b, D.u, x, yy, z, T, F, n.iter=50);

      obj1$uu - obj2$uu

    }

   ##################################################

   mle <- function(b, D.u, x, y, z, indx1, indx2, n.iter)

   {

      u.mean.initial <- array( 0, c(n.random, m) );

      for(i in 1:n.iter)

      {

        obj <- sample.u(b, D.u, x, y, z, u.mean.initial);

        if(indx1) b <- b - solve(obj$Hessian, obj$score);

        if(indx2) D.u <- obj$uu;

        u.mean.initial <- obj$u.mean.initial;

      }

      list(b=b, D.u=D.u, uu=obj$uu, u.mean.initial=u.mean.initial);

     }

##############################################

   sample.u <- function(b, D.u, x, y, z, u.mean.initial)

   {

     D.u.inv <- solve(D.u);

     uu <- matrix(0, n.random, n.random);

     score <- numeric(n.fixed);

     Hessian <- matrix(0, n.fixed, n.fixed);

     for(i in 1:m)

     {

       ada.part <- as.vector(x[, , i] %*% b);

       obj <-  sample.u.one( D.u.inv, ada.part, x[, , i], y[, i], z[,
,i], u.mean.initial[, i] );

       uu <- uu + obj$uu;

       score <- score + obj$score;

       Hessian <- Hessian + obj$Hessian;

       u.mean.initial[, i] <- obj$u.mean.initial;

     }

     uu <- uu/m;

     list(uu=uu, score=score, Hessian=Hessian,
u.mean.initial=u.mean.initial);

  }

 ##########################################

   sample.u.one <- function(D.u.inv, ada.part, x, y, z, u.mean.initial)
#ada.part, x, y, z for one subject only

   {

      fn <- log.like(y, z, D.u.inv, ada.part);

      obj <- optim(u.mean.initial, fn, control=list(fnscale=-1), hessian
= T)

      u.mean.initial <- obj$par;

      var.root <- solve(-obj$hessian);

      var.root <- t( chol(var.root) );

      u <- matrix( rnorm(n.random*n.simu), nrow=n.random, ncol=n.simu );

      log.prob1 <- -colSums(u*u)/2;

      u <- u.mean.initial + var.root %*% u;

      ada <- exp( ada.part + z %*% u );

      ada <- ada/(1+ada); #it is probability now

      log.prob2 <- colSums( y*log(ada) + (1-y)*log(1-ada) );

      log.prob2 <- -quadratic.form2(D.u.inv, u)/2 + log.prob2;

      weight <- exp(log.prob2 - log.prob1);

      weight <- weight/sum(weight);

      ada <- t(ada);

      pi <- colSums(ada*weight);

      product <- colSums( ada*(1-ada)*weight );

      score <- as.vector( (y - pi) %*% x );

      Hessian <- -t(x) %*% ( rep(product, n.fixed)*x );

      obj <- cov.wt( t(u), weight, center=T );

      uu <- obj$cov + obj$center %*% t(obj$center);

      list(uu=uu, score=score, Hessian=Hessian,
u.mean.initial=obj$center);

    }

  #########################################

  log.like <- function(y, z, D.u.inv, ada.part)

  {

     function(u)

     {

        ada <- exp( ada.part + z %*% u );

        ada <- ada/(1+ada);

        sum( y*log(ada) + (1-y)*log(1-ada) ) -quadratic.form(D.u.inv,
u)/2;

     }

  }

  main <- function()

  {

    b <- c(.5, .5 , -1, -.5);

    b <- c(.5, .5 , -1, -.5, 0, 0, 0, 0);

    D.u <- diag( c(.5, .5) );

    x <- array(0, c(n, n.fixed, m) );

    x[ ,1, ] <- 1;

    for(i in 1:n) x[i, 2, ] <- (i-5)/4;

    x[, 3, 1:trunc(m/2)] <- 0;

    x[, 3, trunc(m/2+1):m] <- 1;

    x[, 4, ] <- x[, 2, ]*x[, 3, ];

    x[1, 5:n.fixed, ] <- rnorm(4*m);

    for(i in 2:n) x[i, 5:n.fixed,] <- x[1, 5:n.fixed, ];

    z <- x[, 1:2, ];

    for(i in 1:200)

    {

      print(i); print(date());

      y <- glmm.sample.y(b, D.u, x, z);

      obj <- mle(b, D.u, x, y, z, F, T, n.iter=50);

      print("estimate with b known")

      print(obj$D.u);

      obj <- mle(b, D.u, x, y, z, T, T, n.iter=50);

      print("profile MLE");

      print(obj$D.u);

      obj <- REML.b.D.u(b, D.u, x, y, z, n.iter=50);

      print("REML type estimate")

      print(obj$D.u);

    }

  }

###################################################

  n <- 10;  #number of observation for each cluster

  m <- 50;  #number of clusters

  n.fixed <- 8;

  n.random <- 2;

  n.simu <- 2000;

  main();

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
That's our guess too.  We're running some tests now on the code to see
what's going on, and it's entirely possible the performance gains are
a function of the problem size, so we're testing that too. Changes in
R between 2.7.2 (upon which REvolution R is currently based) and 2.9.0
are also a confounding factor.  (I'm assuming the timings reported
were on 2.9, not 1.9 as stated.) I'll report back here when I have
more details.

# David Smith
A guess: Depends on the problem, the hardware, the matrix libraries,...

e.g. in relatively small problems, Revolution's overhead may consume more
time and resources than the problem warrants. In others, you may see many
fold improvements. Very dangerous to generalize from an example or two (as I
recently experienced to my own chagrin).
--
David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events
Jason Liao <JLIAO <at> hes.hmc.psu.edu> writes:
I care a lot about R's speed. So I decided to give REvolution's R
(http://revolution-computing.com/) a try, which bills itself as an
optimized R. Note that I used the free version.

My machine is a Intel core 2 duo under Windows XP professional. The code
I run is in the end of this post.

First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage
50%

Next, REvolution's R. It takes 2 minutes and 10 seconds, CPU usage 100%.

In other words, REvolution's R consumes double the CPU with slightly
less speed.
The fact that it is the same time with only the 50%/100% makes me think
this is an artifact of the CPU indicator. Try to assign only one CPU to 
the REvolution task. Wild guess: also 2 minutes plus minus

Dieter
We've taken a look at this in a bit more detail; it's a very
interesting example.  Although the code uses several functions that
exploit the parallel processing in REvolution R (notably %*% and
chol), this was one of those situations where the overheads of
threading pretty much balanced any performance gains: the individual
matrices for the operations were too small.

For some examples where the performance gains do show, see:
http://www.revolution-computing.com/products/r-performance.php

A more promising avenue for speeding up this code lies in
parallelizing the outer for(i in 1:200) loop...

# David Smith
I care a lot about R's speed. So I decided to give REvolution's R
(http://revolution-computing.com/) a try, which bills itself as an
optimized R. Note that I used the free version.

My machine is a Intel core 2 duo under Windows XP professional. The code
I run is in the end of this post.

First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage
50%

Next, REvolution's R. It takes 2 minutes and 10 seconds, CPU usage 100%.

In other words, REvolution's R consumes double the CPU with slightly
less speed.

The above has been replicated a few times (as a statistician of course).

Anyone has any insight on this? Anyway, my high hope was dashed.

David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events
David, thanks for following this through. 

I do not know how big a matrix needs to be before the multi-core
multi-threading will start save time. But it seems useful to build this
protection in your distribution so that it will not do multi-core when
multi-threading is more likely to do harm.

Jason
-----Original Message-----
From: David M Smith [mailto:david at revolution-computing.com] 
Sent: Thursday, April 23, 2009 6:05 PM
To: Jason Liao
Cc: r-help at r-project.org
Subject: Re: [R] My surprising experience in trying out REvolution's R

We've taken a look at this in a bit more detail; it's a very
interesting example.  Although the code uses several functions that
exploit the parallel processing in REvolution R (notably %*% and
chol), this was one of those situations where the overheads of
threading pretty much balanced any performance gains: the individual
matrices for the operations were too small.

For some examples where the performance gains do show, see:
http://www.revolution-computing.com/products/r-performance.php

A more promising avenue for speeding up this code lies in
parallelizing the outer for(i in 1:200) loop...

# David Smith

On Tue, Apr 21, 2009 at 9:26 AM, Jason Liao <JLIAO at hes.hmc.psu.edu>
wrote:
I care a lot about R's speed. So I decided to give REvolution's R
(http://revolution-computing.com/) a try, which bills itself as an
optimized R. Note that I used the free version.

My machine is a Intel core 2 duo under Windows XP professional. The
code
I run is in the end of this post.

First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage
50%

Next, REvolution's R. It takes 2 minutes and 10 seconds, CPU usage
100%.

In other words, REvolution's R consumes double the CPU with slightly
less speed.

The above has been replicated a few times (as a statistician of
course).

Anyone has any insight on this? Anyway, my high hope was dashed.

David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at
www.revolution-computing.com/events
You can always turn the multi-threading off with:

setMKLthreads(1)

We tested your code with this setting, and it ran in exactly the same
time as CRAN R.

# David Smith
David, thanks for following this through.

I do not know how big a matrix needs to be before the multi-core
multi-threading will start save time. But it seems useful to build this
protection in your distribution so that it will not do multi-core when
multi-threading is more likely to do harm.

Jason
-----Original Message-----
From: David M Smith [mailto:david at revolution-computing.com]
Sent: Thursday, April 23, 2009 6:05 PM
To: Jason Liao
Cc: r-help at r-project.org
Subject: Re: [R] My surprising experience in trying out REvolution's R

We've taken a look at this in a bit more detail; it's a very
interesting example. ?Although the code uses several functions that
exploit the parallel processing in REvolution R (notably %*% and
chol), this was one of those situations where the overheads of
threading pretty much balanced any performance gains: the individual
matrices for the operations were too small.

For some examples where the performance gains do show, see:
http://www.revolution-computing.com/products/r-performance.php

A more promising avenue for speeding up this code lies in
parallelizing the outer for(i in 1:200) loop...

# David Smith

On Tue, Apr 21, 2009 at 9:26 AM, Jason Liao <JLIAO at hes.hmc.psu.edu>
wrote:
I care a lot about R's speed. So I decided to give REvolution's R
(http://revolution-computing.com/) a try, which bills itself as an
optimized R. Note that I used the free version.

My machine is a Intel core 2 duo under Windows XP professional. The
code
I run is in the end of this post.

First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage
50%

Next, REvolution's R. It takes 2 minutes and 10 seconds, CPU usage
100%.

In other words, REvolution's R consumes double the CPU with slightly
less speed.

The above has been replicated a few times (as a statistician of
course).

Anyone has any insight on this? Anyway, my high hope was dashed.

--
David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at
www.revolution-computing.com/events

David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events