Skip to content

Should I use full models when using Powersim?

4 messages · Chi Zhang, Ben Bolker, Alday, Phillip

#
Dear all,

I tried using powersim from R package simr to estimate the number of participants that I need for an experiment. I performed the simulation based on the data from my pilot study. The model I used is sketched below:

fit <- glmer(B ~ a+b+a:b
             (1+a+b+a:b|Subject) +
             (1+a+b+a:b|Item),
           family = binomial(link="logit"),
           data = data,
           control = glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000),
                                  tol = .0001))

in which Subject and Item mean the distinct id of subjects and items from the pilot study. I want to know test how the power of the interaction term (a:b) changes with the growth of the number of participants. The code I am using is:

fit2<- extend(fit, along="Subject", n = 84)
sim <- powerCurve(fit2, test = fcompare(~a+b), along = "Subject", breaks=c(48,60,72,84), nsim = 5000)
print(sim)

But the results of the simulation was rather bizarre. To begin with, the power of the interaction grew smaller when the number of participants increased from 72 to 84, which I believe is incompatible with the normal observation that the power increases with the number of participants. Second, I tried using the full random model to perform the simulation, but it is really slow (it took me weeks to get just one result). I was wondering if I can use a simpler random model to perform the simulation.

To reiterate my question: first, why my simulated power decreased with the increase of the number of participants? Is there something wrong with my code? Second, can I use a simpler random model for the simulation in order to save time? Thanks in advance!

Best,

Chi Zhang

------------------------
Chi Zhang

PhD Student
Department of Experimental Psychology
Ghent University
Henri Dunantlaan 2, B-9000 Gent, Belgium
Tel: +32 465386530
E-mail: chi.zhang at ugent.be
2 days later
#
Dear Chi,

Does your maximal model converge for your observed data? Are you able to
detect all assumed "true" effects (i.e. are all of your effects of
interest significant)?

If the answer is no: then your data are too noisy to give you a reliable
estimate of the effect and this can present itself as weird results in
simulation-based power analyses. Basically, if your estimates are noisy,
then your simulation will be noisy and your power estimates won't be
reliable. This also holds for power analysis without simulation, but
won't be as obvious: if you don't have a good estimate of your effect
size, then your power analysis will be misleading.

As far as computational time: use the model in your power analysis that
you want to use in your final analysis. After all, you want to estimate
how much power your final analysis has! There is a large amount of
literature debating tradeoffs in Type I & II error when using different
random-effects structures, and you haven't revealed anything else about
your data and experimental design, so there's little specific advice I
can give. However, in my experience, removing the interaction terms from
the random effects generally speeds up the computation a lot while not
really changing model fit. Since the interaction term is the effect you
care about, I would reparameterize the model so that interaction is a
main effect, e.g.

data$ab <- interaction(data$a, data$b)

fit <- glmer(B ~ a * ab + (1 + a + ab | Subject) ....

If your original model take an hour or more to compute,then it's no
surprise that 5000 simulations * 4 points on the power curve = 20 000
model fits take weeks!

Best,

Phillip


PS: Don't name your dataframe "data"! There is a built-in function with
that name in R and if you're not careful, you'll get all sorts of weird
erros.
On 16/2/19 2:52 pm, Chi Zhang wrote:
#
Agree with everything said here. A few super-low-tech partial
solutions ...

  Others may disagree, but 5000 sims seems pretty extreme for estimating
power (which is after all never better than a crude estimate); if you
reduced that 10-fold you'd probably still get an adequate estimate of power.

  It doesn't look like simr does parallel computation yet - if it did,
and if you have access to a reasonably powerful machine, you could
reduce the computation by another factor of <number of cores>. (You
could do this by brute force by manually running a bunch of jobs with
different random-number seeds, or using the parallel package's
parLapply(), then averaging the power curves you get ...)
On 2019-02-18 4:02 p.m., Alday, Phillip wrote:
#
On 18/2/19 10:19 pm, Ben Bolker wrote:
No, simr doesn't really support parallel simulation yet, but see the
discussion on GitHub for some work arounds (e.g. using my experimental
fork or the future package + some custom code to combine things):
https://github.com/pitakakariki/simr/issues/39

Phillip