Skip to content

[R-meta] Inverse weighting after estimation of VCOV

4 messages · pedros@c m@iii@g oii st@ii@u@i-m@rburg@de, James Pustejovsky

#
Dear all,

I have a basic question about the output of my (gu)estimation of the 
variance-covariance matrix. I have extracted results from very 
heterogeneous studies with OR as effect size (sample sizes between 20 
and 300,000). Since some of the results come from the same study, I 
decided to try to use the VCOV as an input and estimated values 
according to the following formula

V_mat? <- vcalc(vi=vi, cluster=shared_variance, data=df_complete, rho=.7)
res_meta ??? <- rma.mv(yi, vi, V=V_mat,
 ?? ???? ??? ??? ??? ??? random = ~ 1 | number, mods = ~ hospitalbeds + 
ltcbeds, verbose=TRUE, data=df_complete)


Interestingly, in this case the weighting is reversed, so that most of 
the weight is given to studies with the smallest sample size; something 
that does not happen when using this formula:

res_meta ??? <- rma(yi, vi,
 ?? ???? ??? ??? ??? ??? random = ~ 1 | number, mods = ~ hospitalbeds + 
ltcbeds, verbose=TRUE, data=df_complete)

I have tried to understand what is going on, but I am at kind of lost. 
Could someone please give me some advice?

Thanks in advance,

David
#
Hi David,

I don't entirely understand the models that you're looking at, so
clarifying the following would help in getting good feedback:
* What is the variable `shared_variance` used in the vcalc call?
* What is the variable `number` used in the random effects argument of
rma.mv?
* How are these variables related?

Additionally, it would be good to check that the vcov matrix created by
vcalc() is as you intend it to be. Could you pull out the blocks of this
matrix for a few studies and just verify that they give you covariance
matrices with a correlation of 0.7? I mean something like:
vcov_study_k <- V_mat[i:j, i:j]
cov2cor(vcov_study_k)
where the indices i:j are the rows in your data corresponding to a given
study k.

James

On Fri, May 24, 2024 at 10:00?AM David Pedrosa via R-sig-meta-analysis <
r-sig-meta-analysis at r-project.org> wrote:

            

  
  
2 days later
#
Hi James,

apologies, my question was not??seasoned enough.

I have a dataframe with 16 studies, all of which provide some odds 
ratios for hospitalisation. 8 studies are from the same publication but 
on different countries. To me there is still reason to believe they 
?share more variance? than the rest. Besides, I want to weigh the total 
number??of subjects from each of the studies. To make it a bit more 
complex, we have digged out the miner of hospital beds and long term 
beds for every country, both of which we consider potential moderators. 
I ran the random effects model

res_metaRE <- rma(yi, vi,
 ?random = ~ 1 | number, mods = ~ hospitalbeds +
ltcbeds, verbose=TRUE, data=df_complete)

to which weights(res_metaRE) provides accurate results. If I try to 
estimate the VCOV matrix, the results show correct diagonal values, that 
is identical to df_conplete$vi. But sticking the resulting V_mat

V_mat <- vcalc(vi=vi, cluster=shared_variance, data=df_complete, rho=.7)

to rma.mv provides results that are too high but especially the studies 
with lower number of subjects are higher weighted. I am assuming that 
it?s just somehow inverted but I cannot understand if I?m missing 
something or if there is some other mistake in the way I?m estimating 
the VCOV. Number is just the study id.

I?m not entirely sure I understand your point with the subsection of the 
matrix.

Thanks for your help!
Best,
David

P.S.: Here are the relevant parts of df_complete

structure(list(number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15), author = c("Aamodt", "Ceylan", "Krause", "Kumar",
"Moens, Belgium", "Moens, France"Moens, Italy", "Moens, Canada",
"Moens, Mexiko", "Moens, New Zeeland", "Moens, Spain", "Moens, South 
Corea",
"Moens, Czech Rep.", "Moens, Hungary", "Moens, USA"), year = c(2023,
2022, 2021, 2021, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015), n_ges = c(53279, 27, 40, 346141, 837, 4599,
4034, 1381, 1062, 202, 352, 1565, 92, 241, 20065), OR = c(1.06,
1.43, 8.25, 1.454, 2.3, 1.5, 1.4, 1.7, 0.95, 1.97, 1.09, 0.95,
0.97, 1.44, 1.4), hospitalbeds = c(2.77, 3.02, 7.76, 2.77, 5.47,
5.65, 3.12, 2.58, 1, 2.57, 2.96, 12.77, 6.66, 6.79, 2.77), ltcbeds = 
c(32.3,
9.5, 54.2, 53.9, 66.8, 47.4, 21.3, 46.7, 0, 50.4, 43.4, 25, 34.9,
42.6, 28.9), p_values = c(0.106809128205467, 0.706331045003814,
0.0281267337718951, 0, 2.43772276381116e-05, 2.76746355676653e-22,
1.01260208850919e-05, 1.19251123951374e-10, 0.772759462747246,
0.0741077696800058, 0.74088983860122, 0.68164335922065, 1, 
0.183303852299051,
3.20176730771634e-26), shared_variance = c(0, 0, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1), yi = structure(c(0.0582689081239758,
0.357674444271816, 2.11021320034659, 0.374318379111328, 0.832909122935104,
0.405465108108164, 0.336472236621213, 0.53062825106217, 
-0.0512932943875506,
0.678033542749897, 0.0861776962410524, -0.0512932943875506, 
-0.0304592074847086,
0.364643113587909, 0.336472236621213), ni = c(53279, 27, 40,
346141, 837, 4599, 4034, 1381, 1062, 202, 352, 1565, 92, 241,
20065), measure = "GEN"), vi = c(0.000835840725678602, 0.638632983584221,
0.604067037193667, 0.000435509388232691, 0.0467214213223696,
0.00468347897652763, 0.00538603813506437, 0.0132951153208062,
0.0214123920152818, 0.142112789690683, 0.0489441998392354, 
0.0138688993962097,
0.186242249276727, 0.0702159732616764, 0.00133268716433697)), row.names 
= c(NA,
-15L), class = c("escalc", "data.frame"), yi.names = "yi", vi.names = 
"vi", digits = c(est = 4,
se = 4, test = 4, pval = 4, ci = 4, var = 4, sevar = 4, fit = 4,
het = 4))

Am 24.05.2024 um 19:06 schrieb James Pustejovsky:
3 days later
#
Hi David,

Thanks for clarifying your data structure. Based on what you've described,
I don't think it makes sense to use vcalc(). The point of vcalc() is to
build in covariance between the sampling errors of the effect size
estimates. For your one publication that reports 8 studies, each effect
size estimate is based on a separate sample of participants (because each
estimate comes from a different country). So there's no reason to expect
that there would be covariance in the sampling errors.

Instead, one might suspect that there would be covariance between the
country-specific effect size parameters (i.e., the "true" effect sizes)
from this publication. This would be plausible if the same operational
procedures (e.g., same recruitment approach, same measurement
instrumentation, same follow-up window) were used across the samples in
this publication. The conventional way to model this would be to 1) specify
effect size estimates as independent but 2) include publication-level
random effects in the model to capture shared operational variance within
publications. The syntax would be something like:
res_metaRE <- rma(
  yi, V = vi,
  random = ~ 1 | publicationID / number,
  mods = ~ hospitalbeds + ltcbeds,
  verbose=TRUE,
  data=df_complete, sparse = TRUE
)
You'll need to create a publicationID variable if you don't already have
that on the data.

The difficulty with this approach in your case is that there's only one
publication that has multiple samples nested within it, so there's not a
lot of information available to parse out the variance at the publication
level from the variance at the sample level (across countries). You could
try using the model fit statistics to compare the model above versus a
model that only has random effects at the sample level.

James

On Mon, May 27, 2024 at 8:54?AM David Pedrosa via R-sig-meta-analysis <
r-sig-meta-analysis at r-project.org> wrote: