[R-meta] Performance of metafor::vcalc() vs clubSandwich::impute_covariance_matrix()
Thanks so much, James! Unfortunately, I didn't find a big enough
improvement in performance using vcalc(sparse = TRUE) - in the example
below, the default vcalc arguments take ~100x longer than
impute_covariance_matrix, while vcalc(sparse = TRUE) takes ~60x longer.
I couldn't reproduce the 2x values using non-proprietary data, so there
might just be something weird going on with my dataset!
Reproducible example (adapted from metafor's examples in the vcalc function
documentation):
```
library(tidyverse)
library(metafor)
library(clubSandwich)
library(microbenchmark)
set.seed(42)
# example data from metafor
dat <- dat.assink2016
# augment data so it has >1500 rows
new_rows <-
tibble(
study = 18:167,
n_esid = sample(x = 1:max(dat$esid), size = 150, replace = TRUE)
) %>%
uncount(n_esid) %>%
group_by(study) %>%
mutate(esid = row_number()) %>%
ungroup() %>%
mutate(
id = row_number() + 100,
yi = rnorm(nrow(.), mean(dat$yi), sd(dat$yi)),
vi = rnorm(nrow(.), mean(dat$vi), sd(dat$vi)),
vi = if_else(vi < 0, -1*vi, vi), # make sure vi is always positive
pubstatus = sample(x = dat$pubstatus, size = nrow(.), replace = TRUE),
year = sample(x = dat$year, size = nrow(.), replace = TRUE),
deltype = sample(x = dat$deltype, size = nrow(.), replace = TRUE)
)
dat_big <- bind_rows(dat, new_rows)
# benchmark performance with full matrix (this takes a minute to run)
res <- microbenchmark(
"metafor" = vcalc(vi, cluster = study, obs = esid, data = dat_big, rho =
0.6),
"clubSandwich" = impute_covariance_matrix(vi = dat_big$vi, cluster =
dat_big$study, r = 0.6, return_list = FALSE),
times = 10
)
summary(res)
# benchmark performance with sparse matrix (also takes a minute to run)
res_sparse <- microbenchmark(
"metafor" = vcalc(vi, cluster = study, obs = esid, data = dat_big, rho =
0.6, sparse = TRUE),
"clubSandwich" = impute_covariance_matrix(vi = dat_big$vi, cluster =
dat_big$study, r = 0.6, return_list = FALSE),
times = 10
)
summary(res_sparse)
```
Thanks again,
*Tamar Novetsky* *(she/her)*
Data Scientist I
Eastern Time Zone
On Tue, Aug 6, 2024 at 10:20?AM James Pustejovsky <jepusto at gmail.com> wrote:
Hi Tamar, The difference in compute time is because of a difference in how the default output of these functions is structured. clubSandwich::impute_covariance_matrix() returns a block-diagonal by default. metafor::vcalc() returns a full (dense) matrix by default. Say that you have J studies and study j has kj effect sizes. The block-diagonal matrix has sum(kj^2) entries, whereas the full matrix has sum(kj)^2 entries. If J is large and the kjs are mostly small, this can make for a really big difference in object size. However, setting the option vcalc(sparse = TRUE) will return a block-diagonal matrix and should lead to performance comparable to impute_covariance_matrix(). Regarding your second question, I'm not sure what might be going on. Could you provide a reproducible example? James On Tue, Aug 6, 2024 at 8:20?AM Tamar Novetsky via R-sig-meta-analysis < r-sig-meta-analysis at r-project.org> wrote:
Hello,
I am working on a script to run multiple meta-regressions on different
subsets of the same dataset, and have been
using clubSandwich::impute_covariance_matrix() to generate the
variance-covariance matrix necessary as an input to metafor::rma.mv().
However, I recently learned that impute_covariance_matrix() has been
superseded by metafor::vcalc(), so I have been working to replace my usage
of the former function with the latter. In that process, I discovered that
vcalc() seems to be much slower than impute_covariance_matrix() - about
150x slower in one use case that I benchmarked using the microbenchmark
package. Since I will be running this many times in a loop, performance
matters quite a lot to me in this context.
Can anyone help me understand why vcalc() would be so much slower? Is it
possible that I'm using it incorrectly?
Secondly/possibly relatedly, I found that the results from vcalc() are
always either exactly the same or exactly double the results from
impute_covariance_matrix(). Does anyone have a sense of why that would be?
Could that be related to the performance differences?
Thanks so much for your help,
*Tamar Novetsky* *(she/her)*
Data Scientist I
Eastern Time Zone
[[alternative HTML version deleted]]
_______________________________________________ R-sig-meta-analysis mailing list @ R-sig-meta-analysis at r-project.org To manage your subscription to this mailing list, go to: https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis