[R-meta] When is it OK to pool between and within-subjects effect sizes in meta-analyses? - R-SIG-meta-analysis

Wed, Jan 14, 2026 12:24 PM #

Hi Everyone,

I am trying to run a Bayesian meta-analysis with effect sizes from both between-subjects and within-subjects study designs. I tried to do my due-diligence, but I am confused. I have written up my question as a Quarto,  .bib and .html files that you may access here: https://gist.github.com/emstruong/3d19cb8861befe2161fa4ee2ceab3bde.

Here is my long write-up:

# Problem Statement

I'm concerned about when it is appropriate to pool effect sizes from between-subjects and within-subjects studies together and would appreciate any guidance. I am finding contradicting recommendations and I don't know what to make of it. Perhaps part of the problem is that the recommendations never seem to directly discuss how meta-regression might affect the analyst's pooling of the effect sizes in the same model. And examples of meta-regression that model the differences in between-subjects and within-subjects studies (referred to as 'study type', hereafter) don't seem to touch on issues of 'construct validity' or issues of what to do when you want to include further predictors in the meta-regression. I have summarized key citations in the section below this blurb.

In particular, given the arguments presented in Hamaker [- at hamaker2012], I think that effects on a between-subjects versus within-subjects level might generally correspond to different phenomena. Hence, I will refer to this general issue as a problem of 'construct validity', specifically whether the effects of between-subjects studies validly measure the *same* phenomena as within-subjects studies.

For those not familiar, in Hamaker [- at hamaker2012], the canonical example is that the between-subjects correlation between typing speed and number of typos is negative: A higher score on the latent variable of typing skill should mean both a higher typing speed and a fewer number of typos. However, the within-subjects correlation between typing speed and number of typos is positive: Asking someone to type faster than what they're comfortable with should increase the number of typose they make. Thus, each effect size from between versus within-subjects designs corresponds to completely different phenomena. (Concrete code for simulating this example are provided at the bottom of this post.)

**This problem of construct validity leads to my series of questions. Generalizing to the case of meta-analyses, when is it appropriate to include effect sizes from both between-subjects and within-subjects studies together within the same meta-analysis?**

-   Does meta-regression using the study type (between-subjects/within-subjects) as a moderator completely address the problems with construct validity?
-   If the effects are truly fundamentally different from one-another, does this mean that further meta-regressions (e.g,. Country of study) should be done using study type in an interaction (i.e., Country x study-type)?
-   Would we better off running two separate meta-analyses, one for all the within-subjects studies and another for all the between-subjects studies?
-   Won't it be problematic to use meta-regression to test whether study type is a significant moderator and hence, should be modelled? If you fail to find a statistically significant difference---despite real underlying differences in construct validity---and proceed to pool the studies together, then won't you end up under-estimating the amount of uncertainty in your meta-analysis? You would be inflating the number of studies used in your estimate of the mean effect.
    -   It is not because the main effect of study type is statistically non-significant that it is guaranteed to not be a highly-significant and important moderator when put in an interaction with some other moderator variable.
    -   The above strategy that I am questioning is basically put forth here: <https://stats.stackexchange.com/a/393582>

# Citations and Snippets from the Literature and Online Blogs

Cooper [- at cooperHandbookResearchSynthesis2019] emphasizes the importance of whether the different studies are asking the same questions or not. Cooper [- at cooperHandbookResearchSynthesis2019] also does not mention whether meta-regression can be used to model the statistical differences between the two types of effect sizes (between subjects and within subjects).

Another classic resource is Morris and DeShon [- at morrisCombiningEffectSize2002]. They don't explicitly frame their argument in terms of validity. Although they do say:

In a blog-post on using Bayesian meta-analysis using `brms`, Kurz [- at kurzBayesianMetaanalysisBrms2022] models the difference between between-subjects and within-subject designed studies using meta-regression, but there is no discussion of construct validity or what we should do if we wanted to add further predictors in our meta-regression.

In the brms book, Burkner [- at burknerBrmsBookApplied2024, pp. 269-271] performs a 'multivariate meta-analysis' on the effects of intranasal oxytocin sprapys and where three different outcomes (on the same scale) are used within the same meta-regression. In this meta-regression, the type of outcome is included as a predictor. I think this is analogous to the problem of pooling between-subjects and within-subjects design because, clearly, each of the three outcomes corresponds to a completely different phenomena. Albeit, any expected change would be attributed to the intervention of interest.

# Regarding Hamaker (2012)

```{r}
# With some coding help from Claude to make things faster
library(ggplot2)
library(patchwork)
set.seed(42)
```

Following the example in Figure 3.1 of Hamaker [- at hamaker2012]:

Let's say that we have two types of data:

-   Between-subjects correlations of number of words per minute and percentage of typos

```{r}
n_cross <- 100

# Generate typing speed (words per minute)
wpm_cross <- rnorm(n_cross, mean = 60, sd = 15)

# Generate typo percentage with negative correlation
# Faster typers (higher skill) make fewer typos and use pmax to ensure non-negativity
typo_pct_cross <- (15 -
  0.15 * wpm_cross +
  rnorm(n_cross, mean = 0, sd = 2)) |>
    pmax(min = 0)

cross_sectional_data <- data.frame(
  person_id = 1:n_cross,
  wpm = wpm_cross,
  typo_percentage = typo_pct_cross
)
```

-   Within-subjects correlations of number of words per minute and percentage of typos

```{r}
n_persons <- 15
n_observations_per_person <- 20

within_person_data <- data.frame()

for (i in 1:n_persons) {
  # Each person has their own baseline skill level
  person_baseline_wpm <- rnorm(1, mean = 60, sd = 15)
  person_baseline_typo <- rnorm(1, mean = 8, sd = 2)

  # Within-person variations around their baseline
  wpm_variation <- rnorm(n_observations_per_person, mean = 0, sd = 8)
  wpm_person <- person_baseline_wpm + wpm_variation

  # Positive within-person correlation: faster typing = more typos
  typo_person <- (person_baseline_typo +
    0.12 * wpm_variation +
    rnorm(n_observations_per_person, mean = 0, sd = 1)) |>
    pmax(min = 0)

  person_data <- data.frame(
    person_id = i,
    wpm = wpm_person,
    typo_percentage = typo_person
  )

  within_person_data <- rbind(within_person_data, person_data)
}
```

Then this implies the following set of graphs

```{r}
# Plot 1: Cross-sectional
p1 <- ggplot(cross_sectional_data, aes(x = wpm, y = typo_percentage)) +
  geom_point(alpha = 0.5, size = 2) +
  stat_ellipse(level = 0.95, color = "black", size = 1) +
  labs(title = "Between-Subjects",
       x = "number of words\nper minute",
       y = "percentage\nof typos") +
  theme_minimal(base_size = 14) +
  theme(axis.title.y = element_text(angle = 90, vjust = 0.5),
        plot.title = element_text(hjust = 0.5))

# Plot 2: Within-person
p2 <- ggplot(within_person_data, aes(x = wpm, y = typo_percentage, group = person_id)) +
  stat_ellipse(level = 0.75, alpha = 0.3, color = "gray60", size = 0.8) +
  labs(title = "Within-Subjects",
       x = "number of words\nper minute",
       y = "percentage\nof typos") +
  theme_minimal(base_size = 14) +
  theme(axis.title.y = element_text(angle = 90, vjust = 0.5),
        plot.title = element_text(hjust = 0.5))

p1 | p2
```

Cheers,
Michael
 
 
--------------------------------------------------------
Michael S. Truong

Flora Quantitative Lab, PhD4
York University
mtruong at yorku.ca | emstruonger at gmail.com
https://emstruong.github.io/
 --------------------------------------------------------

I try to answer emails within a week, once a week.