[Rcpp-devel] iterator for sparse Matrix - Rcpp-devel

Wed, Jan 22, 2014 3:04 PM #

Hi there,

I am new to RcppArmadillo and very excited about the sparse matrix
functionality it offers. I have a large optim that I wish to break down
into stochastic descent in Rcpp to speed things up.

To start small, I used a simple logisitic regression function that I have
working in R -

Norm <- function(w1, w2){
  sqrt(sum((w2 - w1)^2))
}

xentropy <- function(x, y, w) {
  - ((y * x) / (1 + exp(y * t(w) %*% x)) )
}

stochDesc <- function(param, eta = 0.01, maxit = 10, reltol = .01, x, y) {
  # Initialize
  w = param
  iter = 0
  rel_err = reltol + 1
  x_dash = cbind(rep(1, nrow(x)), x) # Add bias

  while(iter < maxit & rel_err > reltol ) {
    iter = iter + 1

    prev_w <- w
    for(curidx in sample(nrow(x))) { # Random shuffle
      grad <- xentropy(x_dash[curidx, ], y[curidx], w)
      w <- w - eta *  grad # Update weights
    }
    rel_err = Norm(w, prev_w)
    # could also use:
    # rel_err = base::norm(matrix(prev_w - w, nrow = 1), type = "F")
    cat(paste("Epoch ", iter, " Relative cost:", rel_err, "\n"))
  }
  list(w = w, iter = iter)
}

This does every operation that my more complex gradient descent will do,
but it is pretty straightforward so I started converting it to Rcpp -

#include <RcppArmadillo.h>
#include <iostream>
using namespace std;
using namespace Rcpp ;
// [[Rcpp::depends(RcppArmadillo)]]

double Norm(NumericVector w1, NumericVector w2) {
  // Compute the Forbius Norm
  double op = pow(sum(pow(w2 - w1, 2)), .5);
  return(op);
}


NumericVector xentropy(NumericVector x, NumericVector y, NumericVector w) {
  // Return the cross entropy
  // NumericVector op = - ((y * x) / (1 + exp(y * trans(w) %*% x)) ); //
How to transpose?
  NumericVector op = w; // Not correct
  return (op);
}


// [[Rcpp::export]]
NumericVector stochDescCpp(NumericVector param, NumericMatrix x,
NumericVector y,
                          double eta = 0.01, int maxit = 10, double reltol
= .01) {
  // Run stochastic Descent to compute logistic regression
  // Initialize
  NumericVector w (param);
  NumericVector prev_w (w);
  int epoch = 0;
  NumericVector grad (param.length());

  double rel_err = reltol + 1;
  // Don't know how to add bias, will do that in R before passing it here.
  //NumericMatrix x_dash(x.nrow(), x.ncol() + 1, rep(1, x.nrow()),
x.begin());

  // Loop through for each epoch
  while(epoch < maxit && rel_err > reltol) {
    epoch++;

    prev_w = w;
    for(int curidx = 0; curidx < x.nrow(); ++curidx) { // How to random
shuffle?
      grad = xentropy(x(curidx, _), y[curidx], w);
      w = w - eta * grad; // Update weights
    }
    rel_err = Norm(w, prev_w);
    cout << "Epoch " << epoch << " Relative cost:" << rel_err << endl;
  }
  return(w);
}

I am running into many issues. Here they are ranked in order of importance)
-
1) When I run this (incorrect) program -
stochDescCpp(param = rep(0,3), x = X, y = Y, maxit = 100)

I get -
Error in .Primitive(".Call")(<pointer: 0x0000000069d42560>, param, x,  :
  negative length vectors are not allowed

All the inputs are valid and work fine with the R version. So I don't know
where this is coming from? I put a bunch of couts and I know that it is
happening inside the for loop but don't know why. In general, what is the
best way to debug code written this way? I am using RStudio.

2) I don't know how to transpose the matrix in rCpp. I read that I could
that using rCppArmadillo, but then I don't know how to convert from
NumericMatrix to arma matrix. I don't want to copy as this operation will
happen 1000x of times, so this has to be fast.

3) I don't know how to randomly shuffle the rows. I tried RcppArmadillo::sample
but haven't gotten the output to work. E.g. do I access elements as
x(rowOrd[curIdx],_)?

4) (minor issue) How do I add the bias unit? In other words, add a new
column to the input matrix filled with all 1s. Do I have to create a new
matrix, copy it column by column? this is not a dealbreaker since I can do
this in R before I send in the inputs.

Sorry if my questions are too basic and don't seem well researched. I am
trying to go for a practical example.

Any help will be greatly appreciated.

Sincerely,
Saurabh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20140122/4335a454/attachment.html>

Dirk Eddelbuettel

Wed, Jan 22, 2014 5:13 PM #

Saurabh,

On 22 January 2014 at 18:04, Saurabh B wrote:

| I am running into many issues. Here they are ranked in order of importance) -
| 1) When I run this (incorrect) program -
| stochDescCpp(param = rep(0,3), x = X, y = Y, maxit = 100)
| 
| I get -
| Error in .Primitive(".Call")(<pointer: 0x0000000069d42560>, param, x, ?:?
| ? negative length vectors are not allowed
| 
| All the inputs are valid and work fine with the R version. So I don't know
| where this is coming from? I put a bunch of couts and I know that it is
| happening inside the for loop but don't know why. In general, what is the best
| way to debug code written this way? I am using RStudio.
| 
| 2) I don't know how to transpose the matrix in rCpp. I read that I could that
| using rCppArmadillo, but then I don't know how to convert from NumericMatrix to
| arma matrix. I don't want to copy as this operation will happen 1000x of times,
| so this has to be fast.
| 
| 3) I don't know how to randomly shuffle the rows. I tried?RcppArmadillo::
| sample but haven't gotten the output to work. E.g. do I access elements as x
| (rowOrd[curIdx],_)?
| 
| 4) (minor issue) How do I add the bias unit? In other words, add a new column
| to the input matrix filled with all 1s. Do I have to create a new matrix, copy
| it column by column? this is not a dealbreaker since I can do this in R before
| I send in the inputs.
| 
| Sorry if my questions are too basic and don't seem well researched. I am trying
| to go for a practical example.?

You are trying too much at once. Attack each issue separately. Some of what
you are after is basic, and there are plenty of examples for RcppArmadillo
out there -- not to mention over 50 CRAN packages using it.  It will help you
to study some of those.

Dirk

Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com

Saurabh B

Wed, Jan 22, 2014 5:21 PM #

Apologies, I will break it down and tackle one thing at a time...

On Wed, Jan 22, 2014 at 8:13 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20140122/5680e77b/attachment.html>