Skip to content

idea for GSoC: an R package for fitting Bayesian Hierarchical Models

7 messages · Antonio, Fabio Di Narzo, Ben Bolker, Hadley Wickham

#
Dear R developers,
these days I'm working on some R code for fitting completely generic
Bayesian Hierarchical Models in R, a la OpenBUGS and JAGS.

A key feature of OpenBUGS and JAGS is that they automatically build an
appropriate MCMC sampler from a generic model, specified as a directed
acyclic graph (DAG).
The spirit of my (would-be) implementation is instead more focused on
experimentation and prototyping, i.e. is the user who explicitely
assign samplers for each model variable after specifying the model.
The sampler can be chosed in a set of predefined samplers, as well as
customly specified by the user as an R or C function in a very
flexible way.

Now I have a prototype scheleton implementation (a bounch of R and C
files, together with some base testing scripts) which works at decent
speed (w.r.t. JAGS) on some example models, and I'm writing a
proof-of-concept, reproducible Sweave file about it, to be published
online shortly.

What do you think about it in general?
What do you think about developing an R package of it as a GSoC project?

Best regards,
Antonio, Fabio Di Narzo.
#
I've put online a temp web page with some more info (and sources):

http://antonio.fabio.googlepages.com/rgs%3Athergibbssampler

Bests,
Antonio.

2008/3/21, Antonio, Fabio Di Narzo <antonio.fabio at gmail.com>:
1 day later
#
Antonio, Fabio Di Narzo <antonio.fabio <at> gmail.com> writes:
Have you seen Jouni Kerman's Umacs package?  It sounds similar
in spirit.

  Something I would love to see done (not that I have the time
and energy to supervise someone to do it right now) would be
an R (or Python/etc.: R wouldn't necessarily be the best tool)
to translate lmer/nlme syntax (Wilkinson-Rogers with extensions
for specifying random factors, correlation structures, etc.)
into a BUGS file.  It strikes me that it would be a really nice
way to bridge the gap between what mixed-model code can do
and what requires BUGS/MCMC.  Such models could also serve as
(1) a way to cross-check the results of mixed model code;
(2) a way to get started in relaxing the assumptions of mixed
models (e.g. allowing for non-normal random effects distributions).

  Ben Bolker
#
Above all,
tnx Ben for taking time to read about my proposal!

2008/3/24, Ben Bolker <bolker at ufl.edu>:
Ya. But speeds are rather different.
I admittely missed a comparison with Umacs in my short demo.
However, from some early experiments (I'm doing while I'm writing), as
I suspected, my approach results being many times faster than Umacs,
even if one doesn't specify samplers as C code. Things goes even
better for my demo implementation if one tries to plug in samplers
specified as pure C code, which would further eliminate a lot of
memory allocations/deallocations behind those "rnorm()".

My aim is to obtain something which achieves decent speed, compared
with JAGS. I mean, I can easily experiment new samplers by using an
interpreted language, but if at the end I obtain something which is
*many* times slower than JAGS (which is moreover much more robust and
easier to work with), the whole stuff results being of little pratical
interest.

More: how can one really experiment a new custom sampler if doing some
thousands iterations takes forever, so that checking your sampler
pratical behaviour is a pain (I speak about my personal experience)?
That's why I want to always keep attention on speed, and give the
possibility to the user to either use R or C code at his choice, with
the ability to modify model node values in place, without unneeded
'malloc's. Ya, I would abandon pure functional style...

I will try to add a reproducible benchmark comparing my demo
implementation with JAGS and Umacs. However, I see that the problem
here is still finding someone interested in it at all.
That sounds interesting. However, I currenlty don't have enough
know-how to work at something like it now.

Antonio.
#
Antonio, Fabio Di Narzo <antonio.fabio <at> gmail.com> writes:
I agree that quantitative differences in speed can make
a qualitative difference in the way one works.

  Well, I'm somewhat interested, but don't feel that I'm
necessarily appropriate as a mentor. I don't think it would
be terrible if you contacted particular people (Gelman, Plummer,
O'Hara?) to see if they were interested ... they could always
say no ...
Do you think so?  I don't think it would be too hard,
if you were interested ...

  Ben
#
There is some interesting work being done on this topic in computer
science - e.g.

@inproceedings{keller:2008,
	Author = {Keller, Gabriele and Chaffey-Millar, Hugh and Chakravarty,
Manuel M. T. and Stewart, Don and Barner-Kowollik, Christopher},
	Booktitle = {Proceedings of the Tenth International Symposium on
Practical Aspects of Declarative Languages},
	Title = {Specialising Simulator Generators for High-Performance
Monte-Carlo Methods},
        Url = {http://www.cse.unsw.edu.au/~chak/project/polysim/},
	Year = {2008}
}

which explores a way to define a simulation at a high-level and then
compile it down to fast low-level primitives.  This seems like an
interesting approach, but I suspect you would struggle to find
students with the requisite statistical and computational backgrounds.

Hadley
#
2008/3/24, hadley wickham <h.wickham at gmail.com>:
Tnx for the reference: that's surely an interesting reading.

Instead of inventing a specialised meta-language for this kind of task
(I don't ever have the knowledge for doing something like that) I've
explored in the recent past the direct use of higher level languages
that can be compiled into native code. I've got most interesting
results in Steel Bank Common Lisp and OCaml. They really seem to do
what they claim :-)

However, writing something like a general purpose Gibbs sampler
framework in these languages seems to be a waste of time, as one
misses a lot of things which are already available in R. Not least:
random number generators from a lot of common distributions! Ok, one
can write wrapper code to the R standalone library, but this all looks
as extra-work.

So, waiting for an R-to-native code compiler, I think a feasible
approach can be to write R functions with pass-by-reference semantics
and a bounch of small C routines. In that respect, I was inspired by
the lush project:
http://lush.sourceforge.net
where mix of high level and C code is encouraged (note that lush at
the end has a native code compiler too).

Antonio.