Skip to content

[Rcpp-devel] Organization of C++ classes for glm families

8 messages · Douglas Bates, Dirk Eddelbuettel, Romain Francois

#
In profiling some of the code for generalized linear mixed models I
discovered that a considerable amount of the time was being taken up
in evaluation of some of the functions in the glm family.  Many of
these functions in the family are candidates for a std::transform
application
List of 12
 $ family    : chr "poisson"
 $ link      : chr "log"
 $ linkfun   :function (mu)
 $ linkinv   :function (eta)
 $ variance  :function (mu)
 $ dev.resids:function (y, mu, wt)
 $ aic       :function (y, n, mu, wt, dev)
 $ mu.eta    :function (eta)
 $ initialize:  expression({   ...
 $ validmu   :function (mu)
 $ valideta  :function (eta)
 $ simulate  :function (object, nsim)
 - attr(*, "class")= chr "family"

In most cases the time is taken in linkinv, variance and mu.eta.

I do have one implementation of glm families in C++ in the lme4a
package from the lme4 project on R-forge but that implementation is
based on a complex set of concrete classes using virtual functions and
doesn't take advantage of containers like std::vector or
Rcpp::NumericVector.

I think a more idiomatic C++ implementation would use containers and
the std::transform algorithm.  The characteristics like the linkfun,
linkinv and mu.eta functions are associated with the link name ("log"
in the example above).

Would it be feasible to associate the scalar transformation function,
in this case it would be the log function from the math library but in
other cases it could be like

double inverseLink(double x) {return 1/x;}

with the name of the link.  I would think of using a std::map for that
but I don't know what the class or type of the "value" slot would be.
Some exploration indicates that it may be a struct that is derived
from the std::unary_function<double, double> but right now my head is
beginning to hurt reading documentation.  Could someone give me an
example of how I would use, say, the exp or the log function in a
std::transform?

The second level would be to associate the scalar transformation
function with the name, say in a std::map but first I need to make
std::transform work with a scalar function.
#
On Tue, Mar 16, 2010 at 2:00 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
To follow up on my own posting, the things I have tried are various versions on

cpp <- '
  Rcpp::NumericVector xv(x);
  Rcpp::NumericVector y(xv.size());
  std::transform(xv.begin(), y.begin(), ll);
  return y;
'
ff <- cfunction(signature(x = "numeric"), cpp, Rcpp = TRUE, verbose = FALSE,
                otherdefs = "double ll(const double xx){return log(xx);}",
                includes = "#include <cmath>")

which always produces an error of the form

:14: error: no matching function for call to ?transform(double*,
double*, double (&)(double))?

I'm at the point one often gets to with a new language that my code
will not compile and I have no idea of how to fix it.
#
On Tue, Mar 16, 2010 at 3:01 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
OK, it was a dumb mistake.  I misread the documentation.  If I change
the code to

cpp <- '
  Rcpp::NumericVector xv(x);
  Rcpp::NumericVector y(xv.size());
  std::transform(xv.begin(), xv.end(), y.begin(), ll);
  return y;
'
ff <- cfunction(signature(x = "numeric"), cpp, Rcpp = TRUE, verbose = FALSE,
                otherdefs = "static inline double ll(double xx){return
log(xx);}",
                includes = "#include <cmath>")

it's happy.  Now on to that question about std::map<std::string,
std::unary_function<double, double>>
#
That's an interesting problem.  AFAICT folks prefer iterators and transform()
and its ilk because it allows you to swap vector() for, say, list() with minimal
fuss.  It is less clear that you pick up speed that way.  Wouldn't the cost
of log() dominate the looping, whether it's old school C style or via iterators?

Dirk
#
Le 16/03/10 21:01, Douglas Bates a ?crit :
I'll reply to the ful posting later, but for now the problem is that you 
are one argument short. transform needs 4 arguments.

I often visit cplusplus.com for a reference on STL containers and 
algorithms, for example: http://cplusplus.com/reference/algorithm/transform/

Also, Dirk recommended the books from Scott Meyers : Effective C++, More 
Effective C++ and Effective STL which are very good (although not 
introductory).
#
On Tue, Mar 16, 2010 at 3:24 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
I was not sufficiently explicit.  The large amounts of time spent in
evaluating the inverse link and things like that are in the R
versions.   They use R constructions like
function (eta)
pmax(exp(eta), .Machine$double.eps)
<environment: 0x1c435e0>

and it is (or, at least, was) the calls to pmax, pmin, etc. that
gobbled up the time.

Several years ago I wrote C code using .Call for the usual link
functions in the binomial family, just to speed up such cases.  For
example, the default link for the binomial is
function (eta)
.Call("logit_linkinv", eta, PACKAGE = "stats")
<environment: 0x1c4b9e0>

Using Rcpp I could certainly extract the R function from the family
and set up the call but for the heavily used families I would prefer
to set up a way to recognize the link name and use compiled code.

In any case I now have code that seems to work

otherd <- '
'
bod <- '
  Rcpp::NumericVector xv(x);
  Rcpp::NumericVector y(xv.size());

  std::map<std::string, double(*)(double)> lst;
  lst["log"] = &log;
  std::transform(xv.begin(), xv.end(), y.begin(), *(lst["log"]));
  return y;
'
ff <- cfunction(signature(x = "numeric"), bod, Rcpp = TRUE, verbose = FALSE,
                otherdefs = otherd, includes = "#include <cmath>")
#
Le 16/03/10 22:09, Douglas Bates a ?crit :
It seems you answered your own questions ?

Maybe it is worth investigating external pointers, which would offer a 
way to have some handle on the function pointer &log from the R side.

We have Rcpp::XPtr for that, but I'm not sure right now what the right 
incantation would be to hold function pointers

(... time passes ...) Maybe it would be easier to wrap the function into 
an std::unary_function<double,double> ..

(I need to think about this in the morning, with fresh coffee).

Romain
#
Le 16/03/10 23:19, Romain Francois a ?crit :
Last try, maybe something like this :

fx <- cfunction( signature(),
'
   typedef pointer_to_unary_function <double,double> link_fun ;
   link_fun* log_link = new link_fun(log);
   Rcpp::XPtr<link_fun> xp( log_link, true );
   return xp ;

', Rcpp = TRUE, includes =
   c("using namespace Rcpp; ", "using namespace std;" ) )


 > fx()
<pointer: 0x100584930>

... maybe when I wake up tomorrow, none of this will make sense

Romain