[R-meta] Confusion about how to use UN structure

Wed, Sep 1, 2021 12:19 PM

Dear Wolfgang,

To correct my question, is it acceptable to not estimate the
correlations (rhos) that are based on too few pairs (i.e., fixing them
to 0) in a "UN" structure but at least estimate the correlations that
are based on a large number of pairs?

Or in such a situation, one has to use a simpler structure like "CS"
or "HCS" and obtain a single rho estimate representing the correlation
among all the pairs of the levels? (Maybe in this case, such a rho
will represent the average correlation among all pairs of the levels,
right?).

This is exactly the situation I'm in now.

Thank you very much,
Simon

On Wed, Sep 1, 2021 at 2:31 AM Simon Harmel <sim.harmel at gmail.com> wrote:

Thank you Reza and Wolfgang for the very detailed explanation.

One quick follow-up question. Say two of my three correlation estimates are based on too few pairs, and I get 1 for them.

Instead of reverting to a simpler structure like HCS, is it acceptable if I still use the UN structure but don't estimate the rhos that are based on few estimates?

Many thanks,
Simon



On Wed, Sep 1, 2021, 2:04 AM Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

To add to this:

The correlation coefficient(s) can also drift towards 1 when particular variance components are very small. For example, taking a simpler structure, we have the standard multilevel model:

dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res

Now imagine the sampling variances had actually been 4 times larger:

res <- rma.mv(yi, vi*4, random = ~ 1 | district/school, data=dat)
res

We find that the estimate of sigma^2.2 is now equal to 0, so no within-district heterogeneity. And as discussed on numerous occasions on this mailing list, the model above can also be parameterized as:

res <- rma.mv(yi, vi*4, random = ~ factor(school) | district, data=dat)
res

And we find that the estimate of rho is 1. If there is no within-district heterogeneity, then the true effects within districts are all the same, that is, they correlate perfectly.

Now let's take an example with an UN structure:

dat <- dat.berkey1998
V <- lapply(split(dat[c("v1i", "v2i")], dat$trial), as.matrix)
V <- bldiag(V)
res1 <- rma.mv(yi, V, mods = ~ outcome - 1, random = ~ outcome | trial, struct="UN", data=dat)
res1

Now suppose the variances (and covariances) had again been larger:

res2 <- rma.mv(yi, V*4, mods = ~ outcome - 1, random = ~ outcome | trial, struct="UN", data=dat)
res2

Here, the estimate of tau^2.2 is starting to drift towards 0 and the estimate of rho is now again 1. In essence, if the true effects for the second outcome have little to no heterogeneity, then it becomes very difficult to estimate the correlation. Let's plot the two likelihood profiles for rho:

profile(res1, rho=1, xlim=c(0,1))
sav <- profile(res2, rho=1, xlim=c(0,1), plot=FALSE)
lines(sav$rho, sav$ll, type="o")

Note that the one for res2 is quite flat, so estimating rho is very difficult (that's also true in res1 with just 5 studies, but not as badly as in res2).

In an extreme scenario, if tau^2.2 was estimated to be zero, then in essence the estimate of rho becomes arbitrary.

I have also seen cases where the estimate of rho drifts towards -1. See this example:

https://wviechtb.github.io/metadat/reference/dat.kalaian1996.html

Sidenote: It's a bit sad that this happens in this example, since this did not occur in Kalaian and Raudenbush (1996) (from which these data were extracted) and in principle I am fitting the exact model described in the paper. But I am also not able to exactly reproduce the dataset - a comparison between the data printed in the article with some of the information / figures reveals some discrepancies, so there must be some printing errors in Table 1 :(

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Reza Norouzian
Sent: Wednesday, 01 September, 2021 7:54
To: Simon Harmel
Cc: R meta
Subject: Re: [R-meta] Confusion about how to use UN structure

Dear Simon,

Without the data, it may be difficult to exactly know what the problem
with your moderator or model is. But here is one issue to look for (I
would say in a good number of cases, "the issue" to look for).

In terms of random-effects' correlational structure, UN is the most
demanding structure. Specifically, this structure aims to estimate the
correlations among all unique pairs of the levels of a moderator
across the levels of an ID variable (hereafter "study"). But to do so
reliably, each pair of a moderator's levels should co-occur in a good
number of studies. Otherwise, the UN-model will be estimating
correlations among the pairs of levels of a moderator that have rarely
or almost never occurred in the studies.

To better see this, consider two categorical moderators each believed
to have three correlated levels; "outcome" (with levels: A,B,C) and
"UN_MOD" (with levels: type1,type2,type3) coded across 30 studies
(data is shown below).

Because in rma.mv(), one would place the target moderator on the
"left" hand-side and the ID variable on the "right" hand-side of the
vertical bar, let's input these variables to a little helper function
to see which one of these moderators is technically more suitable for
use with the "UN" structure:

mod_for_un <- function(data,left,right) crossprod(table(data[c(right,left)])>0)

For your model (random = ~ UN_MOD | study), we can do:

mod_for_un(dat, left ="UN_MOD", right="study")

      UN_MOD
UN_MOD  type1 type2 type3
 type1      1        1        1
 type2      1        18      18
 type3      1        18      30

For the other model (random = ~ outcome | study), we can do:

mod_for_un(dat, left ="outcome", right="study")

      outcome
outcome  A  B  C
     A     30 30 30
     B     30 30 30
     C     30 30 30

In both tables, the diagonal elements show the number of studies in
which a particular level of the moderator has occurred. The
off-diagonal elements show the number of studies in which a particular
pair of levels of the moderator has co-occurred. Comparing the two
tables, you see the problem with the UN_MOD moderator.

With UN_MOD, "type1" has barely co-occurred with any other levels
(type2 and type3) in any study. Thus, estimating correlations between
pairs in UN_MOD that have to do with "type1" is essentially not
possible.

Given that, if you use UN_MOD for the UN structure, you may get an
overparameterized model, and perhaps 2 rho estimates that are drifted
toward unity. You can check this in the profile likelihood function
shown as a flat curve for the rho estimates based on so few
co-occurrences.

So, some takeaways. First, a good UN_MOD is one whose levels vary
within each study for increased co-occurrences of the levels. Second
and related to the first point, the UN structure requires a relatively
large number of studies. Third, if your categories are kind of fluid
by nature, you may be able to slightly modify them to increase the
chances of co-occurrences. Fourth, there is so much that the
multilevel model can do to compensate for the lack of co-occurrences.
Thus, if the UN structure was not really possible, you can adopt a
simpler correlational structure like "HCS".

Best regards,
Reza

#--- The data:
set.seed(33)
dat <- expand.grid(study = 1:30, outcome = rep(LETTERS[1:3],2))
dat$UN_MOD <- sample(c("type1","type2","type3"),nrow(dat),replace =
TRUE, prob = c(.01,.1,.8))
with(dat, dat[order(study),])

ps. note that rma.mv() does output co-occurrences, if you fit a UN
model. But it may be helpful to check your moderators before fitting
the model, in which case mod_for_un() may be useful.

On Tue, Aug 31, 2021 at 7:17 PM Simon Harmel <sim.harmel at gmail.com> wrote:

Hello All,

I want to use the UN structure offered by metafor (rma.mv) and the
literature also supports correlating my moderator's categories in the
studies. But I get rho estimates of 1.00 which makes me think if there
is a problem with (a) my moderator itself, or (b) the way I fit the
model?

For example, a typical model I use is as follows:

rma.mv(eff_size ~ UN_MOD + x1 + x2 -1, V = V, struct = "UN",
                    random = ~ UN_MOD | study, data = data)

where UN_MOD has three categories (ex. type 1, type 2 and type 3).

I have had similar issues with the UN structure in the past, so any
thoughts are highly appreciated.

Simon

[R-meta] Confusion about how to use UN structure

Thread (12 messages)