[R-meta] Correction for sample overlap in a meta-analysis of prevalence - R-SIG-meta-analysis

Tue, Aug 4, 2020 6:25 AM #

Dear all,

 I want to conduct a meta-analysis of around 30 studies (from a systematic
review).

Some background of the studies: The quantity of interest is the prevalence
of RSV infection. Different studies reported RSV prevalence for different
risk groups. Since, it is quite often that some people might suffer from
multiple comorbidities (for example, an individual might have both cardiac
disease and lung disease), and it was not stated clearly in the reported
data if these two sub-populations (cardia disease patients, and lung
disease patients) are mutually exclusive. In the end, I want to have an
overall estimate across all risk groups. Given the fact stated above, it is
likely that some of the data (from two or more risk groups) might share a
proportion of the population. For example, John's study reported data on
cardiac disease as well as lung disease. These two risk groups were
included in the meta-analysis. However, we need to take into account the
fact that, the two sub-populations might share some proportions of
participants.

I was searching on the internet methods to account for the overlap samples
while conducting meta-analysis. There are two papers that address this
problem:

   1. https://academic.oup.com/bioinformatics/article/33/24/3947/3980249 The
   authors proposed FOLD, a method to optimize power in a meta-analysis of
   genetic associations studies with overlapping subjects.
   2.
   http://www.stiftung.at/wp-content/uploads/2015/04/BomPaper_Oct_2014.pdf In
   this paper, the author compared generalized weights and inverse-variance
   weights meta-estimates to account for overlap sample.

My question is:

Are these approaches incorporated into the *metafor* package?
Thanks for your input.
Best,

Thao

*Tr?n Mai Ph??ng Th?o*
Master Student - Master of Statistics
Hasselt University - Belgium.
Email: Thaobrawn at gmail.com / maiphuongthao.tran at student.uhasselt.be
Phone number: + 84 979 397 410+ 84 979 397 410 / 0032 488 0358430032 488
035843

	[[alternative HTML version deleted]]

Wolfgang Viechtbauer

Thu, Aug 6, 2020 5:22 AM #

Dear Thao,

I do not know these papers, so I cannot comment on what methods they describe and whether those could be implemented using metafor.

Obviously, the degree of dependence between overlapping estimates depends on the degree of overlap. Say there are two diseases (as in your example). Then if we had the raw data, we could count the number of individuals that:

x1:  have only disease 1
x2:  have only disease 2
x12: have both disease 1 and 2
x0:  have neither disease

Let n = x1 + x2 + x12 + x0. Then you have p1 = (x1+x12) / n and p2 = (x2+x12) / n as the two prevalences. One could easily work out the covariance (I am too lazy to do that right now), but in the end this won't help, because computing this will require knowing all the x's, not just p1 and p2 and n. And I assume no information is reported on the degree of overlap. One could maybe make some reasonable 'guestimates' and then compute the covariances followed by a sensitivity analysis.

Alternatively, you could use the 'sandwich' method (cluster-robust inference). This has been discussed on this mailing list extensively in the past (not in the context of overlap in such estimates, but the principle is all the same).

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org]
On Behalf Of Thao Tran
Sent: Tuesday, 04 August, 2020 15:26
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Correction for sample overlap in a meta-analysis of
prevalence

Dear all,

I want to conduct a meta-analysis of around 30 studies (from a systematic
review).

Some background of the studies: The quantity of interest is the prevalence
of RSV infection. Different studies reported RSV prevalence for different
risk groups. Since, it is quite often that some people might suffer from
multiple comorbidities (for example, an individual might have both cardiac
disease and lung disease), and it was not stated clearly in the reported
data if these two sub-populations (cardia disease patients, and lung
disease patients) are mutually exclusive. In the end, I want to have an
overall estimate across all risk groups. Given the fact stated above, it is
likely that some of the data (from two or more risk groups) might share a
proportion of the population. For example, John's study reported data on
cardiac disease as well as lung disease. These two risk groups were
included in the meta-analysis. However, we need to take into account the
fact that, the two sub-populations might share some proportions of
participants.

I was searching on the internet methods to account for the overlap samples
while conducting meta-analysis. There are two papers that address this
problem:

  1. https://academic.oup.com/bioinformatics/article/33/24/3947/3980249 The
  authors proposed FOLD, a method to optimize power in a meta-analysis of
  genetic associations studies with overlapping subjects.
  2.
  http://www.stiftung.at/wp-content/uploads/2015/04/BomPaper_Oct_2014.pdf
In
  this paper, the author compared generalized weights and inverse-variance
  weights meta-estimates to account for overlap sample.

My question is:

Are these approaches incorporated into the *metafor* package?
Thanks for your input.
Best,

Thao
--
*Tr?n Mai Ph??ng Th?o*
Master Student - Master of Statistics
Hasselt University - Belgium.
Email: Thaobrawn at gmail.com / maiphuongthao.tran at student.uhasselt.be
Phone number: + 84 979 397 410+ 84 979 397 410 / 0032 488 0358430032 488
035843

Thao Tran

Thu, Aug 6, 2020 5:37 AM #

Hi Wolfgang,
Thanks a lot for your clear response.
I totally agree that the information on the degree of overlapping is not
commonly reported.
I will take a look at the cluster-robust inference you mentioned.

Best,
Thao

On Thu, Aug 6, 2020 at 2:23 PM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Thao,

I do not know these papers, so I cannot comment on what methods they
describe and whether those could be implemented using metafor.

Obviously, the degree of dependence between overlapping estimates depends
on the degree of overlap. Say there are two diseases (as in your example).
Then if we had the raw data, we could count the number of individuals that:

x1:  have only disease 1
x2:  have only disease 2
x12: have both disease 1 and 2
x0:  have neither disease

Let n = x1 + x2 + x12 + x0. Then you have p1 = (x1+x12) / n and p2 =
(x2+x12) / n as the two prevalences. One could easily work out the
covariance (I am too lazy to do that right now), but in the end this won't
help, because computing this will require knowing all the x's, not just p1
and p2 and n. And I assume no information is reported on the degree of
overlap. One could maybe make some reasonable 'guestimates' and then
compute the covariances followed by a sensitivity analysis.

Alternatively, you could use the 'sandwich' method (cluster-robust
inference). This has been discussed on this mailing list extensively in the
past (not in the context of overlap in such estimates, but the principle is
all the same).

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:

r-sig-meta-analysis-bounces at r-project.org]

On Behalf Of Thao Tran
Sent: Tuesday, 04 August, 2020 15:26
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Correction for sample overlap in a meta-analysis of
prevalence

Dear all,

I want to conduct a meta-analysis of around 30 studies (from a systematic
review).

Some background of the studies: The quantity of interest is the prevalence
of RSV infection. Different studies reported RSV prevalence for different
risk groups. Since, it is quite often that some people might suffer from
multiple comorbidities (for example, an individual might have both cardiac
disease and lung disease), and it was not stated clearly in the reported
data if these two sub-populations (cardia disease patients, and lung
disease patients) are mutually exclusive. In the end, I want to have an
overall estimate across all risk groups. Given the fact stated above, it

is

likely that some of the data (from two or more risk groups) might share a
proportion of the population. For example, John's study reported data on
cardiac disease as well as lung disease. These two risk groups were
included in the meta-analysis. However, we need to take into account the
fact that, the two sub-populations might share some proportions of
participants.

I was searching on the internet methods to account for the overlap samples
while conducting meta-analysis. There are two papers that address this
problem:

  1. https://academic.oup.com/bioinformatics/article/33/24/3947/3980249

The

  authors proposed FOLD, a method to optimize power in a meta-analysis of
  genetic associations studies with overlapping subjects.
  2.

http://www.stiftung.at/wp-content/uploads/2015/04/BomPaper_Oct_2014.pdf

In
  this paper, the author compared generalized weights and

inverse-variance

  weights meta-estimates to account for overlap sample.

My question is:

Are these approaches incorporated into the *metafor* package?
Thanks for your input.
Best,

Thao
--
*Tr?n Mai Ph??ng Th?o*
Master Student - Master of Statistics
Hasselt University - Belgium.
Email: Thaobrawn at gmail.com / maiphuongthao.tran at student.uhasselt.be
Phone number: + 84 979 397 410+ 84 979 397 410 / 0032 488 0358430032 488
035843

*Tr?n Mai Ph??ng Th?o*
Master Student - Master of Statistics
Hasselt University - Belgium.
Email: Thaobrawn at gmail.com / maiphuongthao.tran at student.uhasselt.be
Phone number: + 84 979 397 410+ 84 979 397 410 / 0032 488 0358430032 488
035843

	[[alternative HTML version deleted]]

Gerta Ruecker

Thu, Aug 6, 2020 8:32 AM #

Dear Thao,

Another Paper by Pedro Bom and Heiko Rachinger ("A Generalized-Weights 
Solution to Sample Overlap in Meta-Analysis") will soon appear in 
Research Synthesis Methods (early view). You may have a look at it when 
it will be published.

Best,

Gerta

Am 06.08.2020 um 14:37 schrieb Thao Tran:

Hi Wolfgang,
Thanks a lot for your clear response.
I totally agree that the information on the degree of overlapping is not
commonly reported.
I will take a look at the cluster-robust inference you mentioned.

Best,
Thao

On Thu, Aug 6, 2020 at 2:23 PM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Thao,

I do not know these papers, so I cannot comment on what methods they
describe and whether those could be implemented using metafor.

Obviously, the degree of dependence between overlapping estimates depends
on the degree of overlap. Say there are two diseases (as in your example).
Then if we had the raw data, we could count the number of individuals that:

x1:  have only disease 1
x2:  have only disease 2
x12: have both disease 1 and 2
x0:  have neither disease

Let n = x1 + x2 + x12 + x0. Then you have p1 = (x1+x12) / n and p2 =
(x2+x12) / n as the two prevalences. One could easily work out the
covariance (I am too lazy to do that right now), but in the end this won't
help, because computing this will require knowing all the x's, not just p1
and p2 and n. And I assume no information is reported on the degree of
overlap. One could maybe make some reasonable 'guestimates' and then
compute the covariances followed by a sensitivity analysis.

Alternatively, you could use the 'sandwich' method (cluster-robust
inference). This has been discussed on this mailing list extensively in the
past (not in the context of overlap in such estimates, but the principle is
all the same).

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:

r-sig-meta-analysis-bounces at r-project.org]

On Behalf Of Thao Tran
Sent: Tuesday, 04 August, 2020 15:26
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Correction for sample overlap in a meta-analysis of
prevalence

Dear all,

I want to conduct a meta-analysis of around 30 studies (from a systematic
review).

Some background of the studies: The quantity of interest is the prevalence
of RSV infection. Different studies reported RSV prevalence for different
risk groups. Since, it is quite often that some people might suffer from
multiple comorbidities (for example, an individual might have both cardiac
disease and lung disease), and it was not stated clearly in the reported
data if these two sub-populations (cardia disease patients, and lung
disease patients) are mutually exclusive. In the end, I want to have an
overall estimate across all risk groups. Given the fact stated above, it

is

likely that some of the data (from two or more risk groups) might share a
proportion of the population. For example, John's study reported data on
cardiac disease as well as lung disease. These two risk groups were
included in the meta-analysis. However, we need to take into account the
fact that, the two sub-populations might share some proportions of
participants.

I was searching on the internet methods to account for the overlap samples
while conducting meta-analysis. There are two papers that address this
problem:

   1. https://academic.oup.com/bioinformatics/article/33/24/3947/3980249

The

   authors proposed FOLD, a method to optimize power in a meta-analysis of
   genetic associations studies with overlapping subjects.
   2.

http://www.stiftung.at/wp-content/uploads/2015/04/BomPaper_Oct_2014.pdf

In
   this paper, the author compared generalized weights and

inverse-variance

   weights meta-estimates to account for overlap sample.

My question is:

Are these approaches incorporated into the *metafor* package?
Thanks for your input.
Best,

Thao
--
*Tr?n Mai Ph??ng Th?o*
Master Student - Master of Statistics
Hasselt University - Belgium.
Email: Thaobrawn at gmail.com / maiphuongthao.tran at student.uhasselt.be
Phone number: + 84 979 397 410+ 84 979 397 410 / 0032 488 0358430032 488
035843