[R-meta] metafor package in R - Risk ratios using rma.mv()

Fri, Oct 4, 2019 3:27 PM

Dear Wolfgang and All,

Thanks so much for your prior help.
We have calculated incidence and prevalence rates from a mixed-effects
model using the rma.mv command. We are attaching the results below.

*Variable*

*Cohorts (n)*

*Incidence Rate/100k*

*Lower 95% CI*

*Upper 95% CI*

*World Health Organization Region*

    Americas

225

324.5

166.7

482.2

    African

30

1906.4

1134.1

2678.7

    Eastern Mediterranean

24

249.1

32.3

466.0

    European

33

767.6

407.8

1127.3

    South-East Asia

48

1148.7

628.6

1668.9

    Western Pacific

30

560.0

-131.7

1251.7



*Cohorts (n)*

*Prevalence Rate*

*Lower 95% CI*

*Upper 95% CI*

*World Health Organization Region*

    Americas

33

1.7

0.9

2.5

    African

31

2.8

1.4

4.1

    Eastern Mediterranean

7

1.9

0.5

3.4

    European

21

1.9

0.8

3.0

    South-East Asian

10

2.4

-1.1

6.0

    Western Pacific

54

1.2

-0.1

2.4


Unfortunately we have some negative confidence intervals for some of our
incidence and prevalence estimates. We would like to not have any negative
confidence intervals and therefore would like to switch the models that we
are using.

Is there a way to keep our code (which we have put below for both incidence
and prevalence) and run a poisson model for incidence and a binary or beta
model for prevalence so that we no longer have a negative confidence
interval for some of our variables? We noticed that when we run the model
using log transformed incident and prevalence rates, the confidence
intervals are positive. We are also wondering what the difference is
between using PR/IR versus PLN/IRLN for fitting the model, and why the
latter would result in all positive confidence intervals.

Thank you again for all your help!

Best
Olivia and Leo


*Code for incidence rates: *
#data subsetted by WHO region

pd_ec <- escalc(measure = 'IR', xi = data_sub$inc_positive,ti =
data_sub$inc_person_years, append = TRUE,
            data = data_sub)

m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

*Code for prevalence rates: *
#data subsetted by WHO region

pd_ec <- escalc(
            measure = 'PR', xi = data_sub$prev_positive,ni =
data_sub$prev_total_n, append = TRUE,
            data = data_sub)

 m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

Thank you for any help you can provide!

Best
Leo and Olivia



Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Phone: +1.202.769.8090
Email: leomarti at stanford.edu; chopotin at gmail.com
Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>


On Wed, Sep 4, 2019 at 10:57 AM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Forgot to cc the mailing list, so resending this.

-----Original Message-----
From: Viechtbauer, Wolfgang (SP)
Sent: Wednesday, 04 September, 2019 19:36
To: 'Leo Martinez'; 'Olivia Cords'
Subject: RE: [R-meta] metafor package in R - Risk ratios using rma.mv()

Dear Leo, Dear Olivia,

Late response (to Olivia), but I was out of the office the entire August.

Q2) Yes, one can estimate rate ratios this way.

They are different because the log transformation is non-linear. Also, the
normal approximation of the sampling distributions doesn't work in the same
way on the raw and on the log scale. To illustrate:

Let's say we observe x=5 cases in t=100 person years, so IR = 5/100. For
IR values, the normal approximation is IR ~ N(theta, theta/t), where theta
is the true rate per person year (this follows from assuming that x is
Poisson distributed with rate t*theta), so we estimate the sampling
variance with v = IR/t. Hence, a 95% CI for theta is given by:

x <- 5
t <- 100
IR <- x/t
IR + c(-1,1) * qnorm(.975) * sqrt(IR/t)

which yields

0.006173873 0.093826127

For log(IR), the normal approximation (after using the delta method) is
log(IR) ~ N(log(theta), 1/(t*theta)), so we estimate the sampling variance
with v = 1/x. Hence, a 95% CI for theta is given by:

exp(log(IR) + c(-1,1) * qnorm(.975) * sqrt(1/x))

which yields

0.02081139 0.12012652

As you can see, these results are not the same. And this doesn't yet get
into the additional complexities involved when fitting the model you are
fitting (where we estimate additional variance components, which in turn
also has implications for how the estimates are weighted and combined).

Q3) Just exponentiate the CIs for the model coefficients for the model
fitted with measure = "IRLN". So, exp(3.1798) is the first rate ratio with
(approximate) 95% CI exp(1.9348) and exp(4.4248).

There is also a technical issue here that is relevant whenever we analyze
outcomes on some transformed scale where the transformation is non-linear.
exp(3.1798) is actually not the estimated *average* incidence rate for the
African region. To be precise, the correct interpretation is that
exp(3.1798) is the estimated *median* incidence rate for the African
region. The problem is that f(E(X)) != E(f(X)) whenever f() is non-linear
(Jensen's inequality). However, f(M(X)) = M(f(X)) when M() is the median.

So, if we have the estimated average log incidence rate (which, under the
normality assumptions of the model, is equal to the estimated median log
incidence rate), the back-transformation gives us the estimated median
incidence rate (and not the estimated average incidence rate). So, this is
another reason why results are different when you analyze raw or log
transformed incidence rates.

Essentially everybody ignores this issue when analyzing transformed
outcomes. This also applies to correlations, where there was a lot of
debate in the literature around the question whether we should analyze raw
or r-to-z transformed correlations (Adam Hafdahl eventually pointed out
this issue in this context).

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:
r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Leo Martinez
Sent: Wednesday, 04 September, 2019 18:10
To: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] metafor package in R - Risk ratios using rma.mv()

Dear All,

Thanks for your previous help on this thread. I just wanted to follow up on
this topic with a few additional questions regarding  incident rate ratios
and confidence intervals using the metafor package and the rma.mv()
command.

*Incidence Rates*I calculated the incidence rates (measure = "IR", ti =
data$person_years/1000) for tuberculosis on the data subsetted by World
Health Organization Region and got the following results from the model.

m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

Americas Region 3.244
African Region 19.06
Eastern Mediterranean Region 2.491
European Region 7.675
South-East Asian Region 11.48
Western Pacific Region 5.600

Rate Ratios:

Following your suggestion above for calculating rate ratios for each WHO
Region, I used the measure = "IRLN" and exponentiated the coefficients of
the model. I got the following model output:

Multivariate Meta-Analysis Model (k = 390; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed              factor
sigma^2.1  2.3898  1.5459     76     no            study_id
sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id

Test for Residual Heterogeneity:
QE(df = 384) = 118469.9261, p-val < .0001

Test of Moderators (coefficients 2:6):
F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001

Model Results:

                                          estimate      se      tval
pval
intrcpt                                    -7.1326  0.2449  -29.1259
<.0001
who_region.1African Region                  3.1798  0.6332    5.0217
<.0001
who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904
0.3226
who_region.1European Region                 1.8665  0.5612    3.3261
0.0010
who_region.1South-East Asian Region         2.6693  0.9360    2.8518
0.0046
who_region.1Western Pacific Region          1.4697  0.8381    1.7536
0.0803
                                            ci.lb    ci.ub
intrcpt                                   -7.6141  -6.6511  ***
who_region.1African Region                 1.9348   4.4248  ***
who_region.1Eastern Mediterranean Region  -0.9401   2.8483
who_region.1European Region                0.7631   2.9698  ***
who_region.1South-East Asian Region        0.8290   4.5096   **
who_region.1Western Pacific Region        -0.1781   3.1176    .

And exponentiating the coefficients, I got the following rate ratios:

Intrcpt (Americas) 0.000799
African Region 24.04225
Eastern Mediterranean Region 2.596419
European Region 6.465383
South-East Asian Region 14.42971
Western Pacific Region 4.348128

Based on the incidence rates produced by using the entire dataset in the
model and first subsetting by region, these Rate Ratios don't seem to be
correct. Simply dividing the incidence rates by a comparator (Region of the
Americas) to produce rate ratios would give the following:

Region of the Americas
African Region 5.875294245
Eastern Mediterranean Region 0.767788539
European Region 2.365564921
South-East Asian Region 3.540304225
Western Pacific Region 1.725952561

*Q2) Is exponentiating the coefficients (measure = IRLN) the way to
calculate rate ratios? Why are these results so different?*

*Confidence Intervals*To calculate the 95% confidence intervals for the
rate ratios, I first calculated the standard deviation (SD[ln(IR)] = (1/A1
+ 1/A2)^0.5, where A1 and A2 are the number of tuberculosis cases in each
region), and then used the following: 95% CI's = exp[ln(IR) ? 1.96(SD)]).
It seems that this does not take into account the nested structure of the
data.

*Q3) Is there a way to calculate the confidence intervals from the model
(either measure = "IRLN" or measure = "IR") output that takes into account
the nested structure of the data?*
Thank you again for your advice and the creation of this package.

Best
Leo

Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Phone: +1.202.769.8090
Email: leomarti at stanford.edu; chopotin at gmail.com
Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20191004/1c753ea8/attachment-0001.html>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1003_PR_IR.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 14856 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20191004/1c753ea8/attachment-0001.xlsx>

[R-meta] metafor package in R - Risk ratios using rma.mv()

Thread (2 messages)