Skip to content
Prev 5371 / 5632 Next

[R-meta] Performance of metafor::vcalc() vs clubSandwich::impute_covariance_matrix()

Thanks so much, James! Unfortunately, I didn't find a big enough
improvement in performance using vcalc(sparse = TRUE) - in the example
below, the default vcalc arguments take ~100x longer than
impute_covariance_matrix, while vcalc(sparse = TRUE) takes ~60x longer.

I couldn't reproduce the 2x values using non-proprietary data, so there
might just be something weird going on with my dataset!

Reproducible example (adapted from metafor's examples in the vcalc function
documentation):
```
library(tidyverse)
library(metafor)
library(clubSandwich)
library(microbenchmark)
set.seed(42)

# example data from metafor
dat <- dat.assink2016

# augment data so it has >1500 rows
new_rows <-
  tibble(
    study = 18:167,
    n_esid = sample(x = 1:max(dat$esid), size = 150, replace = TRUE)
  ) %>%
  uncount(n_esid) %>%
  group_by(study) %>%
  mutate(esid = row_number()) %>%
  ungroup() %>%
  mutate(
    id = row_number() + 100,
    yi = rnorm(nrow(.), mean(dat$yi), sd(dat$yi)),
    vi = rnorm(nrow(.), mean(dat$vi), sd(dat$vi)),
    vi = if_else(vi < 0, -1*vi, vi), # make sure vi is always positive
    pubstatus = sample(x = dat$pubstatus, size = nrow(.), replace = TRUE),
    year = sample(x = dat$year, size = nrow(.), replace = TRUE),
    deltype = sample(x = dat$deltype, size = nrow(.), replace = TRUE)
  )
dat_big <- bind_rows(dat, new_rows)

# benchmark performance with full matrix (this takes a minute to run)
res <- microbenchmark(
  "metafor" = vcalc(vi, cluster = study, obs = esid, data = dat_big, rho =
0.6),
  "clubSandwich" = impute_covariance_matrix(vi = dat_big$vi, cluster =
dat_big$study, r = 0.6, return_list = FALSE),
  times = 10
)
summary(res)

# benchmark performance with sparse matrix (also takes a minute to run)
res_sparse <- microbenchmark(
  "metafor" = vcalc(vi, cluster = study, obs = esid, data = dat_big, rho =
0.6, sparse = TRUE),
  "clubSandwich" = impute_covariance_matrix(vi = dat_big$vi, cluster =
dat_big$study, r = 0.6, return_list = FALSE),
  times = 10
)
summary(res_sparse)
```

Thanks again,


*Tamar Novetsky* *(she/her)*
Data Scientist I
Eastern Time Zone
On Tue, Aug 6, 2024 at 10:20?AM James Pustejovsky <jepusto at gmail.com> wrote: