With apologies for delayed response. The answer to the "why doesn't roll-your-own match simulate()" is on Stack Overflow now: https://stackoverflow.com/a/73576691/190277. The short answer is that the differences are harmless and caused by differences in the procedures used to pick multivariate Normal deviates (which give the same *distributions*, just not the same specific values for any given realization/random number seed). The answer to "why doesn't simulate() + newdata do what I want" is simple **IF YOU KNOW THE ANSWER ALREADY**, which is that for binomial responses, the number of trials per observation needs to be specified somehow. Your experience reflects arguably a bug in the code, the documentation, or the user interface design (or some combination), but the solution is to specify something like weights = rep(50, nrow(X)) if (for example) you want all of the binomial samples to be out of 50. Or whatever makes sense for your application. (And good for you for paying attention to the "ominous warning"!) Hope that helps. cheers Ben Bolker
On 2022-08-05 2:10 a.m., Rolf Turner wrote:
I wish to assess, via simulation, the impact of certain aspects of the experimental design on the precision of the estimates of a rather intricate parameter, the details of which I won't go into. To this end I need to fit a model (generalised linear mixed model, binomial family) to a data set, and then simulate other data sets, reflecting a number of different experimental designs, from the given fit. I thought that the "newdata" argument of the simulate.merMod() method would provide me with the means to accomplish my goal, but I cannot get it to work. I cannot properly comprehend the documentation of the "newdata" argument in the help file. To start with, to make sure (???) that I understood what was going on I just simulated a data set from a model fitted to the cbpp data, using a "roll-your-own" procedure, and then re-did the simulation using simulate.merMod(). I set seeds appropriately so that I would get the same results, given that my roll-your-own procedure was correct. After many false starts and excursions down blind alleys, I got the two procedures to agree. The details of what I did are to be found in the attached sourceable script "demo.txt". I then attempted to effect simulations using a different data set "X" (just including predictors, no responses) constructed according to a different experimental design. I tried two different approaches. The first approach: s.mer1 <- simulate(fit,newdata=X,allow.new.levels=TRUE) This produced an ominous warning:
In wts - Y : longer object length is not a multiple of shorter object length
The second approach: newpar <- list(theta=getME(fit,"theta"),beta=getME(fit,"beta")) set.seed(101) s.mer2 <- simulate(~0+period +(1|herd),newdata=X,newparams=newpar) This produced warnings:
beta parameter vector not named: assuming same order as internal vector Warning message: In setParams(object, newparams) : some parameters not specified in setParams()
The first bit ("beta parameter vector not named ...") also happens if I
try to reproduce exactly an example given in the help, so perhaps I
should not be too worried by it. The second bit is ominous (and to
me mysterious).
The result of the first approach is incomprehensible to me, and bears
no relationship that I can discern to the results of a roll-your-own
approach that I also tried. The result of the second approach is all
NAs, so something is clearly wrong.
The details of what I did are given in the attached file
"try.newdata.txt".
I would be ever-so-humbly grateful if someone could explain to me what
I am doing wrong in my attempts to use the "newdata" argument of
simulate.merMod().
cheers,
Rolf Turner
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics > E-mail is sent at my convenience; I don't expect replies outside of working hours.