[R-meta] Confusion about how to use UN structure
Thank you Reza and Wolfgang for the very detailed explanation. One quick follow-up question. Say two of my three correlation estimates are based on too few pairs, and I get 1 for them. Instead of reverting to a simpler structure like HCS, is it acceptable if I still use the UN structure but don't estimate the rhos that are based on few estimates? Many thanks, Simon On Wed, Sep 1, 2021, 2:04 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
To add to this:
The correlation coefficient(s) can also drift towards 1 when particular
variance components are very small. For example, taking a simpler
structure, we have the standard multilevel model:
dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res
Now imagine the sampling variances had actually been 4 times larger:
res <- rma.mv(yi, vi*4, random = ~ 1 | district/school, data=dat)
res
We find that the estimate of sigma^2.2 is now equal to 0, so no
within-district heterogeneity. And as discussed on numerous occasions on
this mailing list, the model above can also be parameterized as:
res <- rma.mv(yi, vi*4, random = ~ factor(school) | district, data=dat)
res
And we find that the estimate of rho is 1. If there is no within-district
heterogeneity, then the true effects within districts are all the same,
that is, they correlate perfectly.
Now let's take an example with an UN structure:
dat <- dat.berkey1998
V <- lapply(split(dat[c("v1i", "v2i")], dat$trial), as.matrix)
V <- bldiag(V)
res1 <- rma.mv(yi, V, mods = ~ outcome - 1, random = ~ outcome | trial,
struct="UN", data=dat)
res1
Now suppose the variances (and covariances) had again been larger:
res2 <- rma.mv(yi, V*4, mods = ~ outcome - 1, random = ~ outcome | trial,
struct="UN", data=dat)
res2
Here, the estimate of tau^2.2 is starting to drift towards 0 and the
estimate of rho is now again 1. In essence, if the true effects for the
second outcome have little to no heterogeneity, then it becomes very
difficult to estimate the correlation. Let's plot the two likelihood
profiles for rho:
profile(res1, rho=1, xlim=c(0,1))
sav <- profile(res2, rho=1, xlim=c(0,1), plot=FALSE)
lines(sav$rho, sav$ll, type="o")
Note that the one for res2 is quite flat, so estimating rho is very
difficult (that's also true in res1 with just 5 studies, but not as badly
as in res2).
In an extreme scenario, if tau^2.2 was estimated to be zero, then in
essence the estimate of rho becomes arbitrary.
I have also seen cases where the estimate of rho drifts towards -1. See
this example:
https://wviechtb.github.io/metadat/reference/dat.kalaian1996.html
Sidenote: It's a bit sad that this happens in this example, since this did
not occur in Kalaian and Raudenbush (1996) (from which these data were
extracted) and in principle I am fitting the exact model described in the
paper. But I am also not able to exactly reproduce the dataset - a
comparison between the data printed in the article with some of the
information / figures reveals some discrepancies, so there must be some
printing errors in Table 1 :(
Best,
Wolfgang
-----Original Message----- From: R-sig-meta-analysis [mailto:
r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Reza Norouzian Sent: Wednesday, 01 September, 2021 7:54 To: Simon Harmel Cc: R meta Subject: Re: [R-meta] Confusion about how to use UN structure Dear Simon, Without the data, it may be difficult to exactly know what the problem with your moderator or model is. But here is one issue to look for (I would say in a good number of cases, "the issue" to look for). In terms of random-effects' correlational structure, UN is the most demanding structure. Specifically, this structure aims to estimate the correlations among all unique pairs of the levels of a moderator across the levels of an ID variable (hereafter "study"). But to do so reliably, each pair of a moderator's levels should co-occur in a good number of studies. Otherwise, the UN-model will be estimating correlations among the pairs of levels of a moderator that have rarely or almost never occurred in the studies. To better see this, consider two categorical moderators each believed to have three correlated levels; "outcome" (with levels: A,B,C) and "UN_MOD" (with levels: type1,type2,type3) coded across 30 studies (data is shown below). Because in rma.mv(), one would place the target moderator on the "left" hand-side and the ID variable on the "right" hand-side of the vertical bar, let's input these variables to a little helper function to see which one of these moderators is technically more suitable for use with the "UN" structure: mod_for_un <- function(data,left,right)
crossprod(table(data[c(right,left)])>0)
For your model (random = ~ UN_MOD | study), we can do:
mod_for_un(dat, left ="UN_MOD", right="study")
UN_MOD
UN_MOD type1 type2 type3
type1 1 1 1
type2 1 18 18
type3 1 18 30
For the other model (random = ~ outcome | study), we can do:
mod_for_un(dat, left ="outcome", right="study")
outcome
outcome A B C
A 30 30 30
B 30 30 30
C 30 30 30
In both tables, the diagonal elements show the number of studies in
which a particular level of the moderator has occurred. The
off-diagonal elements show the number of studies in which a particular
pair of levels of the moderator has co-occurred. Comparing the two
tables, you see the problem with the UN_MOD moderator.
With UN_MOD, "type1" has barely co-occurred with any other levels
(type2 and type3) in any study. Thus, estimating correlations between
pairs in UN_MOD that have to do with "type1" is essentially not
possible.
Given that, if you use UN_MOD for the UN structure, you may get an
overparameterized model, and perhaps 2 rho estimates that are drifted
toward unity. You can check this in the profile likelihood function
shown as a flat curve for the rho estimates based on so few
co-occurrences.
So, some takeaways. First, a good UN_MOD is one whose levels vary
within each study for increased co-occurrences of the levels. Second
and related to the first point, the UN structure requires a relatively
large number of studies. Third, if your categories are kind of fluid
by nature, you may be able to slightly modify them to increase the
chances of co-occurrences. Fourth, there is so much that the
multilevel model can do to compensate for the lack of co-occurrences.
Thus, if the UN structure was not really possible, you can adopt a
simpler correlational structure like "HCS".
Best regards,
Reza
#--- The data:
set.seed(33)
dat <- expand.grid(study = 1:30, outcome = rep(LETTERS[1:3],2))
dat$UN_MOD <- sample(c("type1","type2","type3"),nrow(dat),replace =
TRUE, prob = c(.01,.1,.8))
with(dat, dat[order(study),])
ps. note that rma.mv() does output co-occurrences, if you fit a UN
model. But it may be helpful to check your moderators before fitting
the model, in which case mod_for_un() may be useful.
On Tue, Aug 31, 2021 at 7:17 PM Simon Harmel <sim.harmel at gmail.com>
wrote:
Hello All,
I want to use the UN structure offered by metafor (rma.mv) and the
literature also supports correlating my moderator's categories in the
studies. But I get rho estimates of 1.00 which makes me think if there
is a problem with (a) my moderator itself, or (b) the way I fit the
model?
For example, a typical model I use is as follows:
rma.mv(eff_size ~ UN_MOD + x1 + x2 -1, V = V, struct = "UN",
random = ~ UN_MOD | study, data = data)
where UN_MOD has three categories (ex. type 1, type 2 and type 3).
I have had similar issues with the UN structure in the past, so any
thoughts are highly appreciated.
Simon