An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130909/854229d8/attachment.pl>
MCMCglmm with exponential family
2 messages · Iain Stott, Joshua Wiley
Hi Iain, I wrote a package to get predictions from MCMCglmm---importantly the predictions are based on the full posterior, which aside from making them slow and computationally intensive, means that you can marginalize the predictions (or transform, or get credible intervals, or, ...) without having to work out the analytical solution. The downside is that I have not implemented it for the exponential family, but the code base may be a place for you to start. And if you want, I may have some time to work with you and could incorporate support for exponential if you can help provide some simple examples, and help check the accuracy of results. The package is here on github: https://github.com/jwiley/postMCMCglmm Cheers, Josh
On Mon, Sep 9, 2013 at 12:35 PM, Iain Stott <i.stott at exeter.ac.uk> wrote:
An update: I did work out the issue shortly after posting the email, and confirmed my mistake with Jarrod Hadfield, who was very helpful! A scan of his J. Stat. Soft. paper showed me I was using the wrong link: for exponential family, the latent variables (l) are mapped onto the data (y) using y=exp(-l). David Atkins' point (below: thanks for your help!) is also helpful: for non-identity links there is a difference between conditional and marginalised posteriors, and to get the expected marginal value rather than the conditional mode it's necessary to average over the random effects. This is something I wasn't aware of! The predict method for MCMCglmm doesn't support the exponential family yet, but by hacking the code and augmenting with info from [ http://depts.washington.edu/cshrb/wordpress/wp-content/uploads/2013/04/Tutorials-Tutorial-on-Count-Regression-R-code.txt] it should be possible... except that I'm not sure on the analytical solutions for marginalising random effects in exponential models. If anyone has a solution or suggestion, it'd be very welcome. Failing that I'll continue to google, or acquiesce to a less-satisfactory polynomial Gaussian model or gamma lmer. Thanks all Iain On 6 September 2013 21:00, David Atkins <datkins at u.washington.edu> wrote:
Iain-- Just writing back to you as a) I'm not sure about this, and b) it isn't necessarily R specific. One of the things that has bitten me a couple times with mixed models with a non-identity link function is the distinction between conditional and marginal effects. With normal errors and identity link function, predictions from a mixed model essentially average over the random effects. With non-identity link functions this is no longer true, and the predictions based on fixed-effects only are conditional on specific levels of the random effects. It is possible to derive marginal predictions, which do in effect average over the random effects, but this is somewhat more involved and the predictions include random-effects in them. Not sure whether this is what is happening with your data -- since, I'm a psychologist who mostly analyzes psychiatric / drug abuse data, but I would not necessarily assume that your predictions based only on the fixed-effects would give 'good' predictions of your raw data. You might check out an article that colleagues and I wrote on mixed models; in particular, the appendix covers this issue specifically (for Poisson models): http://depts.washington.edu/**cshrb/statistics-resources-** tutorials-tutorial-on-count-**regression/<http://depts.washington.edu/cshrb/statistics-resources-tutorials-tutorial-on-count-regression/> Hope that might help. cheers, Dave Hi R-users, I'm having some problems interpreting mixed effects models using the exponential family. Struggling to find any info on such models on the interwebs. Some background: I'm working with soil carbon data and need robust estimates of concave-up, decreasing depth curves, and confidence intervals on them, which MCMCglmm should be able to provide. The usual way to fit such curves is with a (negative) exponential model, which again is supported by MCMCglmm (my understanding is that the exponential and negative exponential distibutions are the same in everything but name?). My data is pretty much exponentially-distributed (more precisely gamma-distributed with a very strong right skew), therefore is real with range 0<x<Inf. The models: I'm fitting models with one continuous variable (depth), and two categorical variables (soil type / land cover). I've included a single random variable (site), for which I'm fitting either just an intercept or a fully-(co)varying intercept and slope combination using us(). The fixed effects take default priors of N(0, 10e8) and random effects and residuals take uninformative proper priors with V=1, nu=0.002. Below, I've pasted code which will replicate my data and model structure: the range of solutions (intercepts, slopes) I'm getting out of these dummy models are in accordance with what's being computed for my data. The results: Models converge well (albeit after a large number of iterations), but the estimates I'm getting out of them make no sense to me at all. I must be missing something somewhere! Transformed curves do not fit the raw data, and transformed data do not fit the fitted lines. I'm using the canonical inverse link for the exponential distribution, Xbeta=-mu^-1 (and various variations on that theme, to no avail...) I can think of three possible reasons I can't make sense of this: 1. My understanding of the exponential distribution is woefully poor: that it is not equivalent to the negative exponential distribution (in which case Wikipedia would be lying to me) or that I am missing a crucial piece of theory somewhere along the line. 2. Default fixed priors of N(0, 10e8) are completely inadequate for exponential data 3. I've misinterpreted the link function in some way: it's not inverse, the inverse of inverse is for some reason not inverse (!) or I'm using the wrong link altogether. I can't find anything to contradict my thoughts on these points. So, if someone with a better understanding of such things could offer me some words of wisdom, point out to me where I'm making stupid mistakes, it would be hugely appreciated! Thanks Iain CODE: MODELS: resp<-rgamma(406,scale=0.5,**shape=1.4)*75 hist(resp,breaks=100) fix1<-rep(c("A","B","C","D","**E","F","G"),58); fix1<-fix1[order(rnorm(406))]; fix1<-as.factor(fix1) fix2<-rep(c("ONE","TWO"),203); fix2<-fix2[order(rnorm(406))]; fix2<-as.factor(fix2) fix3<-runif(406)*100 ran1<-rep(1:58,7); ran1<-as.factor(ran1) data<-data.frame(resp,fix1,**fix2,fix3,ran1) nitt<-1000000 burn<-nitt*0.1 thin<-nitt*0.0005 prior1.0<-list(R = list(V=1, nu=0.002), G = list(G1=list(V=1, nu=0.002))) prior1.1<-list(R = list(V=1, nu=0.002), G = list(G1=list(V=diag(1,2), nu=0.002))) m1.0<-MCMCglmm(resp~fix1+fix2+**fix3+fix1:fix3+fix2:fix3, random=~ran1, prior=prior1.0, data=data, family="exponential", verbose=F, nitt=nitt, burnin=burn, thin=thin) m1.1<-MCMCglmm(resp~fix1+fix2+**fix3+fix1:fix3+fix2:fix3, random=~us(1+fix3):ran1, prior=prior1.1, data=data, family="exponential", verbose=F, nitt=nitt, burnin=burn, thin=thin) EXAMPLE FIT: sols<-summary(m1.0)$solutions x<-1:100 y<-sols["(Intercept)",1]+x***sols["fix3",1] plot(resp~fix3); lines(x,-y^-1) cbind(x,y,-y^-1) - - - - - - - - - - - - Dr. Iain Stott Environment and Sustainability Institute University of Exeter, Penryn Campus Treliever Road, Penryn, Cornwall, TR10 9EZ, UK. http://www.exeter.ac.uk/esi/**people/stott/<http://www.exeter.ac.uk/esi/people/stott/> - - - - - - - - - - - - http://www.exeter.ac.uk/esi/ http://biosciences.exeter.ac.**uk/cec/<http://biosciences.exeter.ac.uk/cec/> [[alternative HTML version deleted]] -- Dave Atkins, PhD Research Professor Department of Psychiatry and Behavioral Science University of Washington datkins at u.washington.edu http://depts.washington.edu/**cshrb/david-atkins/#more-48<http://depts.washington.edu/cshrb/david-atkins/#more-48> "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." -- John Tukey
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com