examples of combining chains from MCMCglmm
Hi Jarrod, Thanks so much for the reply. I think it was actually issues with faulty parallelization---I was doing test cases on a single core and then scaling to a cluster. There were quite a few (1000s) of levels of the random effects which I was storing and had not taken into account that when run in parallel, I would need to request 2-4GB of RAM per core so I think I had the system thrashing. Clearing that up brought it down to more or less linear. I did still wonder about combining the chains. I've worked out a bunch of code now and am thinking of putting it up on github, but if you have some other centralized (public) place for content related to MCMCglmm, I'd be happy to put it there with examples. Do you have or would you consider something like: https://github.com/hadley/ggplot2 ? In particular if you click the Wiki tab there is a bunch of stuff and the community can contribute so you would just keep working on the code base, but then others would have a central place to put examples, citations to your package, FAQs, stuff like that. Cheers, Josh
On Wed, Oct 3, 2012 at 3:47 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
Hi, I would expect the run time to be linear if the thinning interval is set so that the number of iterations stored is equivalent in the two runs. It may be non-linear otherwise particularly the number of latent variabels/random effects to be stored is very large. If this is not the case it suggests a memory leak. If you could provide sessionInfo() and a reproducible example that would be great? Cheers, Jarrod Quoting Joshua Wiley <jwiley.psych at gmail.com> on Sun, 16 Sep 2012 13:01:09 -0700:
Hi All,
Just wondering if anyone has examples lying around of combining chains
from different runs of MCMCglmm on the same model? If anyone does,
I'd love to look at some. Ideally they would be generalized (i.e.,
able to combine an arbitrary number of chains). If not, once I am
done I will probably make a little example and post it somewhere.
Also, the time to complete does not seem to be a linear function of
the number of iterations. Does anyone have comments on that? I am
saving a bunch of information (pr = TRUE, pl = TRUE, saveX = TRUE,
saveZ = TRUE, saveXL = TRUE) so perhaps it has to do with that. I ran
2e4 iterations and it took about 6.5 minutes. 6e4 iterations took
39.5 minutes, or nearly twice as long as would be expected from a
linear increase. I cannot share the actual data, but the general
structure of the model is:
MCMCglmm(outcome ~ 22 fixed predictors, family = "ordinal", data = dat,
random = ~ var1 + var2,
prior = list(
B = list(mu = rep(0, 23), V = diag(23) * (1 + 1)),
R = list(V = 1, fix = 1),
G = list(
G1 = list(V = 1, nu = .002),
G2 = list(V = 1, nu = .002)
)),
pr=TRUE, pl=TRUE, saveX = TRUE, saveZ = TRUE, saveXL = TRUE,
nitt = 4e5, thin = 1000, burnin = 1e4)
The thinning is high because I had problems with autocorrelation on
some parameters, possible mixing issues related to relatively
unbalanced distribution of the outcome (approximately 80%, 10%, 10%
for a three level ordered outcome).
Thanks for any thoughts or tips,
Josh
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/