MCMCglmm with exponential family

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130909/854229d8/attachment.pl>
Hi Iain,

I wrote a package to get predictions from MCMCglmm---importantly the
predictions are based on the full posterior, which aside from making
them slow and computationally intensive, means that you can
marginalize the predictions (or transform, or get credible intervals,
or, ...) without having to work out the analytical solution.

The downside is that I have not implemented it for the exponential
family, but the code base may be a place for you to start.  And if you
want, I may have some time to work with you and could incorporate
support for exponential if you can help provide some simple examples,
and help check the accuracy of results.

The package is here on github: https://github.com/jwiley/postMCMCglmm

Cheers,

Josh
An update: I did work out the issue shortly after posting the email, and
confirmed my mistake with Jarrod Hadfield, who was very helpful! A scan of
his J. Stat. Soft. paper showed me I was using the wrong link: for
exponential family, the latent variables (l) are mapped onto the data (y)
using y=exp(-l).

David Atkins' point (below: thanks for your help!) is also helpful: for
non-identity links there is a difference between conditional and
marginalised posteriors, and to get the expected marginal value rather than
the conditional mode it's necessary to average over the random effects.
This is something I wasn't aware of! The predict method for MCMCglmm
doesn't support the exponential family yet, but by hacking the code and
augmenting with info from [
http://depts.washington.edu/cshrb/wordpress/wp-content/uploads/2013/04/Tutorials-Tutorial-on-Count-Regression-R-code.txt]
it should be possible... except that I'm not sure on the analytical
solutions for marginalising random effects in exponential models. If anyone
has a solution or suggestion, it'd be very welcome. Failing that I'll
continue to google, or acquiesce to a less-satisfactory polynomial Gaussian
model or gamma lmer.

Thanks all

Iain

On 6 September 2013 21:00, David Atkins <datkins at u.washington.edu> wrote:

Iain--

Just writing back to you as a) I'm not sure about this, and b) it isn't
necessarily R specific.

One of the things that has bitten me a couple times with mixed models
with a non-identity link function is the distinction between conditional
and marginal effects.  With normal errors and identity link function,
predictions from a mixed model essentially average over the random effects.
 With non-identity link functions this is no longer true, and the
predictions based on fixed-effects only are conditional on specific levels
of the random effects.  It is possible to derive marginal predictions,
which do in effect average over the random effects, but this is somewhat
more involved and the predictions include random-effects in them.

Not sure whether this is what is happening with your data -- since, I'm a
psychologist who mostly analyzes psychiatric / drug abuse data, but I would
not necessarily assume that your predictions based only on the
fixed-effects would give 'good' predictions of your raw data.

You might check out an article that colleagues and I wrote on mixed
models; in particular, the appendix covers this issue specifically (for
Poisson models):

http://depts.washington.edu/**cshrb/statistics-resources-**
tutorials-tutorial-on-count-**regression/<http://depts.washington.edu/cshrb/statistics-resources-tutorials-tutorial-on-count-regression/>

Hope that might help.

cheers, Dave

Hi R-users,

I'm having some problems interpreting mixed effects models using the
exponential family. Struggling to find any info on such models on the
interwebs.

Some background: I'm working with soil carbon data and need robust
estimates of concave-up, decreasing depth curves, and confidence intervals
on them, which MCMCglmm should be able to provide. The usual way to fit
such curves is with a (negative) exponential model, which again is
supported by MCMCglmm (my understanding is that the exponential and
negative exponential distibutions are the same in everything but name?).
My
data is pretty much exponentially-distributed (more precisely
gamma-distributed with a very strong right skew), therefore is real with
range 0<x<Inf.

The models: I'm fitting models with one continuous variable (depth), and
two categorical variables (soil type / land cover). I've included a single
random variable (site), for which I'm fitting either just an intercept or
a
fully-(co)varying intercept and slope combination using us(). The fixed
effects take default priors of N(0, 10e8) and random effects and residuals
take uninformative proper priors with V=1, nu=0.002. Below, I've pasted
code which will replicate my data and model structure: the range of
solutions (intercepts, slopes) I'm getting out of these dummy models are
in
accordance with what's being computed for my data.

The results: Models converge well (albeit after a large number of
iterations), but the estimates I'm getting out of them make no sense to me
at all. I must be missing something somewhere! Transformed curves do not
fit the raw data, and transformed data do not fit the fitted lines. I'm
using the canonical inverse link for the exponential distribution,
Xbeta=-mu^-1 (and various variations on that theme, to no avail...)

I can think of three possible reasons I can't make sense of this:

1. My understanding of the exponential distribution is woefully poor: that
it is not equivalent to the negative exponential distribution (in which
case Wikipedia would be lying to me) or that I am missing a crucial piece
of theory somewhere along the line.

2. Default fixed priors of N(0, 10e8) are completely inadequate for
exponential data

3. I've misinterpreted the link function in some way: it's not inverse,
the
inverse of inverse is for some reason not inverse (!) or I'm using the
wrong link altogether.

I can't find anything to contradict my thoughts on these points. So, if
someone with a better understanding of such things could offer me some
words of wisdom, point out to me where I'm making stupid mistakes, it
would
be hugely appreciated!

Thanks

Iain

CODE:

MODELS:

resp<-rgamma(406,scale=0.5,**shape=1.4)*75
hist(resp,breaks=100)

fix1<-rep(c("A","B","C","D","**E","F","G"),58);
fix1<-fix1[order(rnorm(406))]; fix1<-as.factor(fix1)
fix2<-rep(c("ONE","TWO"),203); fix2<-fix2[order(rnorm(406))];
fix2<-as.factor(fix2)
fix3<-runif(406)*100

ran1<-rep(1:58,7); ran1<-as.factor(ran1)

data<-data.frame(resp,fix1,**fix2,fix3,ran1)

nitt<-1000000
burn<-nitt*0.1
thin<-nitt*0.0005

prior1.0<-list(R = list(V=1, nu=0.002),
               G = list(G1=list(V=1, nu=0.002)))

prior1.1<-list(R = list(V=1, nu=0.002),
               G = list(G1=list(V=diag(1,2), nu=0.002)))

m1.0<-MCMCglmm(resp~fix1+fix2+**fix3+fix1:fix3+fix2:fix3,
               random=~ran1, prior=prior1.0, data=data,
family="exponential", verbose=F,
               nitt=nitt, burnin=burn, thin=thin)

m1.1<-MCMCglmm(resp~fix1+fix2+**fix3+fix1:fix3+fix2:fix3,
               random=~us(1+fix3):ran1, prior=prior1.1, data=data,
family="exponential", verbose=F,
               nitt=nitt, burnin=burn, thin=thin)

EXAMPLE FIT:

sols<-summary(m1.0)$solutions

x<-1:100
y<-sols["(Intercept)",1]+x***sols["fix3",1]
plot(resp~fix3); lines(x,-y^-1)
cbind(x,y,-y^-1)

- - - - - - - - - - - -
Dr. Iain Stott
Environment and Sustainability Institute
University of Exeter, Penryn Campus
Treliever Road, Penryn,
Cornwall, TR10 9EZ, UK.
http://www.exeter.ac.uk/esi/**people/stott/<http://www.exeter.ac.uk/esi/people/stott/>
- - - - - - - - - - - -
http://www.exeter.ac.uk/esi/
http://biosciences.exeter.ac.**uk/cec/<http://biosciences.exeter.ac.uk/cec/>

        [[alternative HTML version deleted]]

--
Dave Atkins, PhD

Research Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
http://depts.washington.edu/**cshrb/david-atkins/#more-48<http://depts.washington.edu/cshrb/david-atkins/#more-48>

"The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data." -- John Tukey

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com