[R-meta] "Categorical" moderator varying within and between studies - R-SIG-meta-analysis

Sat, Dec 5, 2020 9:23 AM #

Hi Simon,

For a binary variable (as you've operationalized gender), the contextual
effect is a comparison one category versus the reference level. If you
repeated the analysis but using males as the reference level, I think you
would find coefficients that are identical except of opposite sign.

James

On Wed, Dec 2, 2020 at 8:35 PM Simon Harmel <sim.harmel at gmail.com> wrote:

Hi James,

I keep coming back to our informative discussion in this thread. So a
quick follow-up. Last time, we did the following to obtain the contextual
effect for males. Given this model (i.e., mg_b_w), can we obtain the
contextual effect for females OR we need to fit a new model, this time with
males as the reference group?

Thank you, Simon

library(dplyr)
library(fastDummies)
library(lme4)

hsb <- read.csv("
https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv")

hsb2 <- hsb %>%
  mutate(gender = ifelse(female==0,"M","F")) %>%         # create 'gender'
from variable ?female?
  dummy_columns(select_columns = "gender") %>%           # create dummies
for 'gender? (creates 2 but we need 1)
  group_by(sch.id) %>%                                   # group by
cluster id 'sch.id'
  mutate(across(starts_with("gender_"), list(wthn = ~ . - mean(.), btw = ~
mean(.))))

mg_b_w <- lmer(math ~ gender_M_wthn + gender_M_btw + (1|sch.id), data =
hsb2)

fixef(mg_b_w)[["gender_M_btw"]] - fixef(mg_b_w)[["gender_M_wthn"]]  #
gives 1.92 as the contextual effect for males

On Thu, Oct 29, 2020 at 2:09 PM James Pustejovsky <jepusto at gmail.com>
wrote:

My apologies! I had this backwards in my head. Revised explanation below:

With gender, if you include the group-mean-centered dummy variables and
the cluster-level means, then the contextual effect will be as you
described (gender_M_btw - gender_M_wthn). However, another approach would
be to leave the dummy variables uncentered. If you do it this way, then the
coefficient on gender_M_btw corresponds exactly to the contextual effect,
with no need to subtract out the coefficient on gender_M_wthn.

R code verifying the equivalence of these approaches:

library(dplyr)
library(fastDummies)
library(lme4)

hsb <- read.csv("
https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv")

hsb2 <- hsb %>%
  mutate(gender = ifelse(female==0,"M","F")) %>%   # create 'gender? from
variable ?female?
  dummy_columns(select_columns = "gender") %>%     # create dummies for
'gender? (creates 2 but we need 1)
  group_by(sch.id) %>%                             # group by cluster id
'sch.id'
  mutate(across(starts_with("gender_"), list(wthn = ~ . - mean(.), btw =
~ mean(.))))

mg_b_w <- lmer(math ~ gender_M_wthn + gender_M_btw + (1|sch.id), data =
hsb2)

mg_b_d <- lmer(math ~ gender_M + gender_M_btw + (1|sch.id), data = hsb2)

fixef(mg_b_w)[["gender_M_btw"]] - fixef(mg_b_w)[["gender_M_wthn"]]
fixef(mg_b_d)[["gender_M_btw"]]

On Thu, Oct 29, 2020 at 1:57 PM Simon Harmel <sim.harmel at gmail.com>
wrote:

Thank you, James. For uniformity, I always (i.e., for both categorical &
numeric predictors) use the following method (using a dataset I found on
Stack Overflow).

So, in the case below, you're saying  gender_M_btw is the contextual
effect itself?

Simon

library(dplyr)
library(fastDummies)
library(lme4)

hsb <- read.csv("
https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv")

hsb2 <- hsb %>%
mutate(gender = ifelse(female==0,"M","F")) %>%   # create 'gender? from
variable ?female?
dummy_columns(select_columns = "gender") %>%     # create dummies for
'gender? (creates 2 but we need 1)
group_by(sch.id) %>%                             # group by cluster id '
sch.id'
mutate(across(starts_with("gender_"), list(wthn = ~ . - mean(.), btw = ~
mean(.))))

mg_b_w <- lmer(math ~ gender_M_wthn + gender_M_btw + (1|sch.id), data =
hsb2)

On Thu, Oct 29, 2020 at 1:31 PM James Pustejovsky <jepusto at gmail.com>
wrote:

Hi Simon,

There are different ways to parameterize contextual effects. With
gender, if you include the regular dummy variables (without
group-mean-centering) plus the cluster-level means, then the contextual
effect will be as you described (gender_M_btw - gender_M_wthn). However,
another approach would be to first group-mean-center the dummy variables.
In this approach, for a male student, gender_M_wthn would be equal to 1
minus the proportion of male students in the cluster, and for a female
student, gender_M_wthn would be equal to the negative of the proportion of
male students in the cluster. If you do it this way, then the coefficient
on gender_M_btw corresponds exactly to the contextual effect, with no need
to subtract out the coefficient on gender_M_wthn.

All that said, if you have more than two categories you will have more
than one contextual effect. In your example, you have a contextual effect
for M, which would be the average difference in the DV between two units
who are both male, but belong to clusters that differ by 1 percentage point
in the composition of males *and have the same proportion of
other-gender students *(i.e., clusters that have 1 percentage point
difference in males, and a -1 percentage point difference in females). And
then you have a contextual effect for other, corresponding the average
difference in the DV between two units who are both other-gender, but
belong to clusters that differ by 1 percentage point in the composition of
other *and have the same proportion of male-gender students *(i.e.,
clusters that have 1 percentage point difference in other, and a -1
percentage point difference in females).

James

On Thu, Oct 29, 2020 at 12:24 PM Simon Harmel <sim.harmel at gmail.com>
wrote:

Dear James,

This makes perfect sense, many thanks. However, one thing remains. I
know the contextual effect coefficient is "b_btw - b_wthn". If we have two
categories (as in the case of "gender") and take females as the
reference category, then the contextual effect coefficient will be:

gender_M_btw  - gender_M_wthn

But if we have more than two categories (say we add a third "gender"
category called OTHER), then will the contextual effect coefficient be (sum
of the betweens) - (sum of the withins)?

  (gender_M_btw + gender_OTHER_btw)  - (gender_M_wthn  +
gender_OTHER_wthn)



On Thu, Oct 29, 2020 at 9:44 AM James Pustejovsky <jepusto at gmail.com>
wrote:

Hi Simon,

With a binary or categorical predictor, one could operationalize the
contextual effect in terms of proportions (0-1 scale) or percentages (0-100
scale). If proportions, like say proportion of vegetarians, then the
contextual effect would be the average difference in the DV between two
units who are both vegetarian (i.e., have the same value of the predictor),
but belong to clusters that are all vegetarian versus all omnivorous (i.e.,
that differ by one unit in the proportion for that predictor). That will
make the contextual effects look quite large because it's an extreme
comparison--absurdly so, in this case, since there can't be a vegetarian in
a cluster of all omnivores.

If you operationalize the contextual effect in terms of percentages
(e.g., % vegetarians) then you get the average difference in the DV
between two units who are both vegetarian, but belong to clusters that
differ by 1 percentage point in the proportion of vegetarians.

All of this works for multi-category predictors also. Say that you
had vegetarians, pescatarians, and omnivores, with omnivores as the
reference category, then the model would include group-mean-centered dummy
variables for vegetarians and pescatarians, plus group-mean predictors
representing the proportion/percentage of vegetarians and
proportion/percentage of pescatarians. You have to omit one category at
each level to avoid collinearity with the intercept.

James

On Thu, Oct 29, 2020 at 1:32 AM Simon Harmel <sim.harmel at gmail.com>
wrote:

Dear James,

I'm returning to this after a while, a quick question. In your
gender example, you used the term "%female" in your interpretation of the
contextual effect. If the categorical predictor had more than 2 categories,
then would you still use the term % in your interpretation?

My understanding of contextual effect is below:

Contextual effect is the average difference in the DV between two
units (e.g., subjects) which have the same value on an IV (e.g., same
gender), but belong to clusters (e.g., schools) whose mean/percentage on
that IV differs by one unit  (is unit percentage if IV is categorical?).

Thank you, Simon



On Sun, Jun 7, 2020 at 7:30 AM James Pustejovsky <jepusto at gmail.com>
wrote:

Yes, it?s general and also applies outside the context of
meta-analysis. See for example Raudenbush & Bryk (2002) for a good
discussion on centering and contextual effects in hierarchical linear
models.

On Jun 6, 2020, at 11:07 PM, Simon Harmel <sim.harmel at gmail.com>
wrote:

Many thanks James. A quick follow-up. The strategy that you
described is a general, regression modeling strategy, right? I mean even if
we were fitting a multi-level model, the fixed-effects part of the formula
had to include the same construction of (i.e., *b1 (%
female-within)_ij + b2 (% female-between)_j*) in it?

Thanks,
Simon

On Thu, Jun 4, 2020 at 9:42 AM James Pustejovsky <jepusto at gmail.com>
wrote:

Hi Simon,

Please keep the listserv cc'd so that others can benefit from
these discussions.

Unfortunately, I don't think there is any single answer to your
question---analytic strategies just depend too much on what your research
questions are and the substantive context that you're working in.

But speaking generally, the advantages of splitting predictors
into within- and between-study versions are two-fold. First is that doing
this provides an understanding of the structure of the data you're working
with, in that it forces one to consider *which* predictors have
within-study variation and *how much *variation there is (e.g.,
perhaps many studies have looked at internalizing symptoms, many studies
have looked at externalizing symptoms, but only a few have looked at both
types of outcomes in the same sample). The second advantage is that
within-study predictors have a distinct interpretation from between-study
predictors, and the within-study version is often theoretically more
interesting/salient. That's because comparisons of effect sizes based on
within-study variation hold constant other aspects of the studies that
could influence effect size (and that could muddy the interpretation of the
moderator).

Here is an example that comes up often in research synthesis
projects. Suppose that you're interested in whether participant sex
moderates the effect of some intervention. Most of the studies in the
sample are of type A, such that only aggregated effect sizes can be
calculated. For these type A studies, we are able to determine a) the
average effect size across the full sample (pooling across sex) and b) the
sex composition of the sample (e.g., % female). For a smaller number of
studies of type B, we are able to obtain dis-aggregated results for
subgroups of male and female participants. For these studies, we are able
to determine a) the average effect size for males and b) the average effect
size for females, plus c) the sex composition of each of the sub-samples
(respectively 0% and 100% female).

Without considering within/between variation in the predictor, a
meta-regression testing for whether sex is a moderator is:

Y_ij = b0 + b1 (% female)_ij + e_ij

The coefficient b1 describes how effect size magnitude varies
across samples that differ by 1% in the percent of females. But the
estimate of this coefficient pools information across studies of type A and
studies of type B, essentially assuming that the contextual effects
(variance explained by sample composition) are the same as the
individual-level moderator effects (how the intervention effect varies
between males and females).

Now, if we use the within/between decomposition, the
meta-regression becomes:

Y_ij = b0 + b1 (% female-within)_ij + b2 (% female-between)_j +
e_ij

In this model, b1 will be estimated *using only the studies of
type B*, as an average of the moderator effects for the studies
that provide dis-aggregated data. And b2 will be estimated using studies of
type A and the study-level average % female in studies of type B. Thus b2
can be interpreted as a pure contextual effect (variance explained by
sample composition). Why does this matter? It's because contextual effects
usually have a much murkier interpretation than individual-level moderator
effects. Maybe this particular intervention has been tested for several
different professions (e.g., education, nursing, dentistry, construction),
and professions that tend to have higher proportions of females are also
those that tend to be lower-status. If there is a positive contextual
effect for % female, then it might be that a) the intervention really is
more effective for females than for males or b) the intervention is equally
effective for males and females but tends to work better when used with
lower-status professions. Looking at between/within study variance in the
predictor lets us disentangle those possibilities, at least partially.

James

On Wed, Jun 3, 2020 at 9:27 AM Simon Harmel <sim.harmel at gmail.com>
wrote:

Indeed that was the problem, Greta, Thanks.

But James, in meta-analysis having multiple categorical variables
each with several levels is very pervasive and they often vary both
within and between studies.

So, if for each level of each of such categorical variables we
need to do this, this would certainly become a daunting task in addition to
making the model extremely big.

My follow-up question is what is your strategy after you create
within and between dummies for each of such categorical variables? What are
the next steps?

Thank you very much, Simon

p.s. After your `robu()` call I get: `Warning message: In
sqrt(eigenval) : NaNs produced`

On Wed, Jun 3, 2020 at 8:45 AM Gerta Ruecker <
ruecker at imbi.uni-freiburg.de> wrote:

Simon

Maybe there should not be a line break between "Relative and
Rating"?

For characters, for example if they are used as legends, line
breaks
sometimes matter.

Best,

Gerta

Am 03.06.2020 um 15:32 schrieb James Pustejovsky:

I'm not sure what produced that error and I cannot reproduce

it. It may

have to do something with the version of dplyr. Here's an

alternative way

to recode the Scoring variable, which might be less prone to

versioning

differences:

library(dplyr)
library(fastDummies)
library(robumeta)

data("oswald2013")

oswald_centered <-
   oswald2013 %>%

   # make dummy variables
   mutate(
     Scoring = factor(Scoring,
                      levels = c("Absolute", "Difference

Score", "Relative

Rating"),
                      labels = c("Absolute", "Difference",

"Relative"))

   ) %>%
   dummy_columns(select_columns = "Scoring") %>%

   # centering by study
   group_by(Study) %>%
   mutate_at(vars(starts_with("Scoring_")),
             list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%

   # calculate Fisher Z and variance
   mutate(
     Z = atanh(R),
     V = 1 / (N - 3)
   )


# Use the predictors in a meta-regression model
# with Scoring = Absolute as the omitted category

robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
        Scoring_Difference_btw + Scoring_Relative_btw,
      data = oswald_centered, studynum = Study, var.eff.size =

V)

On Tue, Jun 2, 2020 at 10:20 PM Simon Harmel <

sim.harmel at gmail.com> wrote:

Many thanks, James! I keep getting the following error when I

run your

code:

Error: unexpected symbol in:
"Rating" = "Relative")
oswald_centered"

On Tue, Jun 2, 2020 at 10:00 PM James Pustejovsky <

jepusto at gmail.com>

wrote:

Hi Simon,

The same strategy can be followed by using dummy variables

for each

unique level of a categorical moderator. The idea would be

to 1) create

dummy variables for each category, 2) calculate the

study-level means of

the dummy variables (between-cluster predictors), and 3)

calculate the

group-mean centered dummy variables (within-cluster

predictors). Just like

if you're working with regular categorical predictors,

you'll have to pick

one reference level to omit when using these sets of

predictors.

Here is an example of how to carry out such calculations in

R, using the

fastDummies package along with a bit of dplyr:

library(dplyr)
library(fastDummies)
library(robumeta)

data("oswald2013")

oswald_centered <-
   oswald2013 %>%

   # make dummy variables
   mutate(
     Scoring = recode(Scoring, "Difference Score" =

"Difference",

"Relative Rating" = "Relative")
   ) %>%
   dummy_columns(select_columns = "Scoring") %>%

   # centering by study
   group_by(Study) %>%
   mutate_at(vars(starts_with("Scoring_")),
             list(wthn = ~ . - mean(.), btw = ~ mean(.))) %>%

   # calculate Fisher Z and variance
   mutate(
     Z = atanh(R),
     V = 1 / (N - 3)
   )


# Use the predictors in a meta-regression model
# with Scoring = Absolute as the omitted category

robu(Z ~ Scoring_Difference_wthn + Scoring_Relative_wthn +
Scoring_Difference_btw + Scoring_Relative_btw, data =

oswald_centered,

studynum = Study, var.eff.size = V)


Kind Regards,
James

On Tue, Jun 2, 2020 at 6:49 PM Simon Harmel <

sim.harmel at gmail.com> wrote:

Hi All,

Page 13 of *THIS ARTICLE
<

https://cran.r-project.org/web/packages/robumeta/vignettes/robumetaVignette.pdf

  (*top of the page*) recommends that if a *continuous

moderator *varies

both within and across studies in a meta-analysis, a

strategy is to break

that moderator down into two moderators by:

*(a)* taking the mean of each study (between-cluster

effect),

*(b)* centering the predictor within each study

(within-cluster effect).

BUT what if my original moderator that varies both within

and across

studies is a *"categorical" *moderator?

I appreciate an R demonstration of the strategy recommended.
Thanks,
Simon

         [[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis