Skip to content

[R-meta] Sample size and continuity correction

13 messages · ne gic, Wolfgang Viechtbauer, Nelson Ndegwa +3 more

#
Dear List,

I have general meta-analysis questions that are not
platform/software related.

*=======================*
*1. Issue of few included studies *
* =======================*
It seems common to see published meta-analyses with few studies e.g. :

(A). An analysis of only 2 studies.
(B). In another, subgroup analyses ending up with only one study in one of
the subgroups.

Nevertheless, they still end up providing a pooled estimate in their
respective forest plots.

So my question is, is there an agreed upon (or rule of thumb, or in your
view) minimum number of studies below which meta-analysis becomes
unacceptable?

What interpretations/conclusions can one really draw from such analyses?

*===================*
*2. Continuity correction *
* ===================*

In studies of rare events, zero events tend to occur and it seems common to
add a small value so that the zero is taken care of somehow.

If for instance, the inclusion of this small value via continuity
correction leads to differing results e.g. from non-significant results
when not using correction, to significant results when using it, what does
make of that? Can we trust such results?

If one instead opts to calculate a risk difference instead, and test that
for significance, would this be a better solution (more reliable result?)
to the continuity correction problem above?


Looking forward to hearing your views as diverse as they may be in cases
where there is no consensus.

Sincerely,
nelly
1 day later
#
Dear nelly,

See my responses below.
Agreed upon? Not that I am aware of. Some may want at least 5 studies (per group or overall), some 10, others may be fine with if one group only contains 1 or 2 studies.
That's a vague question, so I can't really answer this in general. Of course, estimates will be imprecise when k is small (overall or within groups).
If this happens, then the p-value is probably fluctuating around 0.05 (or whatever cutoff is used for declaring results as significant). The difference between p=.06 and p=.04 is (very very unlikely) to be significant (Gelman & Stern, 2006). Or, to use the words of Rosnow and Rosenthal (1989): "[...] surely, God loves the .06 nearly as much as the .05". 

Gelman, A., & Stern, H. (2006). The difference between "significant" and "not significant" is not itself statistically significant. American Statistician, 60(4), 328-331.

Rosnow, R.L. & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284.
If one is worried about the use of 'continuity corrections', then I think the more appropriate reaction is to use 'exact likelihood' methods (such as using (mixed-effects) logistic regression models or beta-binomial models) instead of switching to risk differences (nothing wrong with the latter, but risk differences are really a fudamentally different effect size measure compared to risk/odds ratios).
#
Many thanks for the insights Wolfgang.

Apologies for my imprecise questions. By "agreed upon" & "what
conclusions/interpretations", I was thinking if there is a minimum sample
size whose pooled estimate can be considered somewhat reliable to produce
robust inferences e.g. inferences drawn from just 2 studies can be
drastically changed by the publication of a third study for instance - but
it seems like there isn't. But I guess readers have to then check this for
themselves to access how much weight they can place on the conclusions of
specific meta-analyses.

Again, I appreciate it!

Sincerely,
nelly

On Thu, Aug 27, 2020 at 3:43 PM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

            

  
  
#
Dear Nelly and all,

With respect to (only) the first question (sample size):

I think nothing is wrong, at least in principle, with a meta-analysis of 
two studies. We analyze single studies, so why not combining two of 
them? They may even include hundreds of patients.

Of course, it is impossible to obtain a decent estimate of the 
between-study variance/heterogeneity from two or three studies. But if 
the confidence intervals are overlapping, I don't see any reason to 
mistrust the pooled effect estimate.

Best,

Gerta



Am 27.08.2020 um 16:07 schrieb ne gic:

  
    
#
Dear Gerta,

I agree with you. In the interest of playing the devil's advocate - and my
(and some list members) learning more, what would your opinion be if the CI
of the 2 studies did not overlap?

Appreciate your response.

Sincerely,
nelly

On Thu, 27 Aug 2020 at 18:21, Gerta Ruecker <ruecker at imbi.uni-freiburg.de>
wrote:

  
  
#
Wait, are you also Nelly @Nelson?

On Thu, Aug 27, 2020 at 6:44 PM Nelson Ndegwa <nelson.ndegwa at gmail.com>
wrote:

  
  
#
Haha, sorry, I  was editing a response that included your signature and
forgot to exclude your signature :-)

nelson
On Thu, 27 Aug 2020 at 18:47, ne gic <negic4 at gmail.com> wrote:

            

  
  
#
Thank you @Gerta!

Sincerely,
nelly

On Thu, Aug 27, 2020 at 6:21 PM Gerta Ruecker <ruecker at imbi.uni-freiburg.de>
wrote:

  
  
#
To answer your question, Nelson:

If I have only two studies and the confidence intervals don't overlap, I 
would usually present a forest plot without a pooled estimate and 
discuss this in the text as indication of large heterogeneity.

However, this also depends on the relevance of the difference on the 
outcome scale, depending on subject-matter considerations. For example, 
if I am estimating incidence rate ratios or something similar based on 
very big populations, the CIs may be very short and thus 
non-overlapping, but this may not be important with respect to 
heterogeneity. For an example, see Figure 2b in the attached paper 
(antibiotics density): The first two CIs are not overlapping, but this 
doesn't seems to be a big difference. It's only due to the enormous size 
of the studies.

Best,

Gerta

Am 27.08.2020 um 18:49 schrieb Nelson Ndegwa:
#
Hi Gerta,

That's a nice approach actually.

Kind Regards,
Nelson

On Thu, 27 Aug 2020 at 19:31, Gerta Ruecker <ruecker at imbi.uni-freiburg.de>
wrote:

  
  
#
Dear Nelly,

you may need to distinguish between frequentist and Bayesian methods
here.

Firstly, you may wonder how "representative" a small sample can
possibly be of some general population, however, when you think about
it, this is not necessarily an issue tied to small samples -- you could
also think of large samples that are not representative, e.g., due to
selection biases.

Secondly, small sample sizes (small numbers of studies or few events
within a study) may lead to "technical" difficulties for the meta-
analysis methods. Consider for example the normal approximation that is
often utilized in a normal model; this tends to break down e.g. if you
are looking at a log-OR endpoint and you only have one, two or no
events in one of the study arms. Continuity corrections then may help,
but only to a certain degree. Such issues are discussed e.g. by Jackson
and White (2018; https://doi.org/10.1002/bimj.201800071).

You can then substitute the Normal approximation by a more accurate
model (e.g., a Binomial likelihood; see e.g. the proposals discussed by
Seide et al. (2018; https://doi.org/10.1186/s12874-018-0618-3)).
However, many methods may still perform unsatisfactorily for few
studies or few events, essentially because they often rely on many-
study and/or many-event asymptotics.

This is where frequentist and Bayesian methods may behave somewhat
differently. Bayesian methods generally behave reasonably for any study
number or size, however, the asymptotics issue does not completely go
away. For many studes and many events, the prior information that is
formally included in the model tends to make little difference; but the
fewer studies or events you have, the more important the prior
assumptions will become. It hence crucial to convincingly motivate the
prior assumptions you make. A fully Bayesian approach for few studies
and events (based on a binomial model) is described e.g. by G?nhan et
al. (2020; https://doi.org/10.1002/jrsm.1370). Within the common normal
model, you usually first of all have to worry about prior specification
for the heterogeneity parameter; we have recently summarized some
guidance here: https://arxiv.org/abs/2007.08352 .

Cheers,

Christian
On Wed, 2020-08-26 at 10:15 +0200, ne gic wrote:
#
Gerta, In the case of two studies there is a caveat w.r.t. to the
overlapping CI heuristic (probably also in the three study case, but I do
not know a number for that):

If, say, the assumptions of the two-sample t-test hold, then the CIs might
overlap, but the t-test might be significant. The significance of the
t-test might be seen as an indicator of heterogeneity. Goldstein and Healy
(1995) argue in favour of 83% CIs because of this suggestion (I am not sure
I buy into that) and there is also a note by Cumming and Finch (2005). Even
if the assumptions of the two-sample t-test do not hold, but appropriate
CIs are available, the "overlap but significant differences" might still
hold.

Harvey Goldstein; Michael J. R. Healy. The Graphical Presentation of a
Collection of Means, Journal of the Royal Statistical Society, Vol. 158,
No. 1. (1995), p. 175-177.

Cumming, Geoff; Finch, Sue. Inference by Eye: Confidence Intervals and How
to Read Pictures of Data, American Psychologist, Vol 60(2), Feb-Mar 2005,
p. 170-180.

-Philipp



On Thu, Aug 27, 2020 at 9:24 PM Gerta Ruecker <ruecker at imbi.uni-freiburg.de>
wrote:

  
    
#
Hi Philipp,

Yes, of course. I never said that overlapping and non-significance of 
differences are equivalent. I even didn't define "overlapping" properly. 
My focus was the problem of few studies in a meta-analysis Nelly brought 
up, and my main point is that two studies in a meta-analysis is not the 
same problem as two individuals in a clinical trial: two studies can 
still mean we have thousands of individuals and much information about 
effect sizes. What we don't have is information about between-study 
variance, therefore I think the "overlapping CI heuristic" helpful as a 
caveat.

Best,

Gerta

Am 28.08.2020 um 10:43 schrieb Philipp Doebler: