[R-meta] IPD meta analysis / complex survey design - R-SIG-meta-analysis

Thu, Mar 4, 2021 2:29 AM #

Dear all

I come back to you about the IPD meta-analysis we are conducting to explore
the effect of month of birth on the persistence of ADHD. I had already
asked for your help a few months ago when I was writing the protocol. We
have since completed our systematic review and started to include data from
different cohorts. As the month of birth is sensitive data, we do not ask
the authors to send us the raw data: we have constructed an R-script that
we send to the authors and which performs the analyses automatically and
shares the anonymised results. We then carry out a classic two-stage
meta-analysis based on summary results.

We are facing a new challenge that we did not anticipate. Several studies
involve complex survey design. Some studies have clusters (e.g., twin
cohorts or assessments of several regular siblings per family), while
others have even more complex sampling (and include for example sampling
weights, stratum or finite population correction (fpc)). Some studies
include both (clusters + stratum/weights/fpc).

To analyse the data with clustering, naturally we thought of using mixed
models via the glmer function of lme4 (our VD is binary: ADHD persistence
yes/no). However, lme4 does not allow to handle - for the moment - sampling
weights or stratifications. Therefore, for all data with clustering and/or
weights and/or stratum and/or fpc, our idea was to use only the svyglm
function of the survey package in order to have a coherent group of
analyses (we know that the glmer and svyglm functions do not use the same
coefficients (marginals vs. conditionals)).

Our question is the following: can we group within the same meta-analysis
coefficients that come from standard logistic regressions and coefficients
that come from generalised mixed models fitted using glmer or generalised
linear models adapted to complex designs fitted using svyglm?

To support our question, we performed some tests on a dataset including
clusters and sampling weights. Here are the results :

 ######################################################################

*On raw dataset* (df_raw is a dataset containing clustering)

*# regular logistic regressions on the raw data (we ignore clustering / we
ignore sampling weights):*

summary(glm(DV ~ IV, family = "binomial", data = df_raw))

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -1.07497    0.12907  -8.328   <2e-16 ***
IV (month)  0.03916    0.01732   2.261   0.0238 *


*# generalized mixed model via lme4 (we take into account the clustering
(ID variable) / we ignore sampling weights):*

summary(lme4::glmer(DV  ~ IV + (1 | ID), family = "binomial", data =
df_raw))

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -1.10949    0.14571  -7.614 2.65e-14 ***
IV (month)  0.04034    0.01793   2.250   0.0245 *



*# generalized linear model via survey package (we take into account the
clustering (ID variable) / we ignore sampling weights):*

dclus1<- survey::svydesign(id= ~ID,  data = df_raw)

summary(survey::svyglm(DV ~ IV, design = dclus1, family = quasibinomial()))

Estimate Std. Error t value Pr(>|t|)

(Intercept)  -1.07497    0.12927  -8.316 2.31e-16 ***
IV (month) 0.03916    0.01729   2.265   0.0237 *



*# generalized linear model via survey package (we take into account the
clustering (ID variable) / we take into account sampling weights (WEIGHT
variable)):*

dclus2<- survey::svydesign(id=~ID,  weights = ~WEIGHT, data = df_raw)

summary(survey::svyglm(DV ~ IV, design = dclus2, family = quasibinomial()))

 Estimate Std. Error t value Pr(>|t|)

(Intercept)  -0.98952    0.15475  -6.394 2.25e-10 ***
IV (month) 0.02195    0.02069   1.061    0.289


######################################################################

*On an aggregated dataset *(df_agg is the same dataset as df_raw but not
containing any clustering: we have randomly selected one child per cluster).

length(unique(df_agg$ID)) is equal to nrow(df_agg)


*# regular logistic regressions on the aggregated data (we ignore sampling
weights):*

summary(glm(DV ~ IV, family = "binomial", data = df_agg))

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -1.07309    0.13328  -8.051  8.2e-16 ***
IV (month) 0.04327    0.01782   2.428   0.0152 *


*# generalized mixed model via lme4 (we ignore sampling weights):*

summary(lme4::glmer(DV  ~ IV + (1 | ID), family = "binomial", data =
df_agg))

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -1.07309    0.13328  -8.051  8.2e-16 ***
IV (month)    0.04327    0.01782   2.428   0.0152 *


*# generalized linear model adapted to complex design via survey (we ignore
sampling weights):*

dclus4<- survey::svydesign(id= ~ID,  data = df_agg)

summary(survey::svyglm(DV ~ IV, design = dclus4, family = quasibinomial()))

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -1.07309    0.13351  -8.037 2.12e-15 ***
IV (month)   0.04327    0.01785   2.424   0.0155 *


*# generalized linear model adapted to complex design via survey (we take
into account sampling weights):*

dclus5<- survey::svydesign(id= ~ID,  weights = WEIGHT, data = df_agg)

summary(survey::svyglm(DV ~ IV, design = dclus5, family = quasibinomial()))

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -0.95961    0.15957  -6.014 2.38e-09 ***
IV (month)    0.02471    0.02133   1.159    0.247


As you can see, the results are almost the same from the models, except
when we take into account sampling weights. I hope that our problem is
clearly exposed

Thank you very much in advance for your help!

Corentin J Gosling

Wolfgang Viechtbauer

Fri, Mar 5, 2021 12:32 AM #

Dear Corentin,

I cannot answer your question directly, that is, to what extent those results are comparable to each other, although if svyglm() gives 'marginal' (population averaged) coefficients in the sense of what a GEE model would do, then one could argue that those should not be combined with 'conditional' coefficients that glmer() provides (searching for combinations of terms like "GEE, marginal, population averaged, logistic mixed-effects, conditional, subject-specific" should turn up relevant discussions / papers).

But leaving this aside, one could also just approach this issue entirely empirically, that is, simply code the type of analysis / type of coefficient for each study and examine in a moderator analysis whether there are systematic differences between the different types.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of GOSLING Corentin
Sent: Thursday, 04 March, 2021 11:29
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] IPD meta analysis / complex survey design

Dear all

I come back to you about the IPD meta-analysis we are conducting to explore
the effect of month of birth on the persistence of ADHD. I had already
asked for your help a few months ago when I was writing the protocol. We
have since completed our systematic review and started to include data from
different cohorts. As the month of birth is sensitive data, we do not ask
the authors to send us the raw data: we have constructed an R-script that
we send to the authors and which performs the analyses automatically and
shares the anonymised results. We then carry out a classic two-stage
meta-analysis based on summary results.

We are facing a new challenge that we did not anticipate. Several studies
involve complex survey design. Some studies have clusters (e.g., twin
cohorts or assessments of several regular siblings per family), while
others have even more complex sampling (and include for example sampling
weights, stratum or finite population correction (fpc)). Some studies
include both (clusters + stratum/weights/fpc).

To analyse the data with clustering, naturally we thought of using mixed
models via the glmer function of lme4 (our VD is binary: ADHD persistence
yes/no). However, lme4 does not allow to handle - for the moment - sampling
weights or stratifications. Therefore, for all data with clustering and/or
weights and/or stratum and/or fpc, our idea was to use only the svyglm
function of the survey package in order to have a coherent group of
analyses (we know that the glmer and svyglm functions do not use the same
coefficients (marginals vs. conditionals)).

Our question is the following: can we group within the same meta-analysis
coefficients that come from standard logistic regressions and coefficients
that come from generalised mixed models fitted using glmer or generalised
linear models adapted to complex designs fitted using svyglm?

To support our question, we performed some tests on a dataset including
clusters and sampling weights. Here are the results :

[...]

As you can see, the results are almost the same from the models, except
when we take into account sampling weights. I hope that our problem is
clearly exposed

Thank you very much in advance for your help!

Corentin J Gosling

GOSLING Corentin

Fri, Mar 5, 2021 1:32 AM #

Dear Prof Viechtbauer,

Thank you very much for your reply!

Sorry, my question was a bit misleading. In line with your suggestion, our
aim is to avoid merging ?marginal ?coefficients and ?conditional?
coefficients by using only the svyglm function as soon as the data has a
complex structure (clustering and/or weighting, etc...).

You are entirely right, in situations with clustering only, we could
compare 3 approaches : (i) select only 1 individual per cluster and use glm
function, or keep clustering and use (ii) glmer function or (iii) svyglm
function. However, we are a bit reluctant to make these comparisons for two
reasons. First, as soon as data have a more complex structure (e.g.
sampling weights), the only approach allowing to take this into account is
the svyglm function. This makes comparisons a bit strange, as in our
examples, since one analysis is taking account of some specificity of the
design while the others are not. Second, from a practical point of
view, the burden on authors will become even more complicated as the time
required for analysis is already sometimes quite long (in particular
because of several multiple imputation models). We are concerned that the
multiplication of tests may sometimes make the analysis time so long that
it may discourage some authors from participating.

Our question was whether - within the same meta-analysis - we could "safely
*" *include effect sizes estimated by a standard logistic regression (when
data have a regular structure) +  effect sizes estimated by the svyglm
function (when the data have a complex structure). By safely, I mean without
having to compare the results of the svyglm function to other functions
(such as glm or glmer) when data have a complex structure.

If this is not possible, a more anecdotal question was whether it is
possible to "safely" include  effect sizes estimated by a  standard logistic
regression (when data have a regular structure) + effect sizes estimated by
the glmer function (when data have clustering).

Thank you so much for your help!

Best wishes

Corentin Gosling

Le ven. 5 mars 2021 ? 09:32, Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> a ?crit :

Dear Corentin,

I cannot answer your question directly, that is, to what extent those
results are comparable to each other, although if svyglm() gives 'marginal'
(population averaged) coefficients in the sense of what a GEE model would
do, then one could argue that those should not be combined with
'conditional' coefficients that glmer() provides (searching for
combinations of terms like "GEE, marginal, population averaged, logistic
mixed-effects, conditional, subject-specific" should turn up relevant
discussions / papers).

But leaving this aside, one could also just approach this issue entirely
empirically, that is, simply code the type of analysis / type of
coefficient for each study and examine in a moderator analysis whether
there are systematic differences between the different types.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:

r-sig-meta-analysis-bounces at r-project.org] On

Behalf Of GOSLING Corentin
Sent: Thursday, 04 March, 2021 11:29
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] IPD meta analysis / complex survey design

Dear all

I come back to you about the IPD meta-analysis we are conducting to

explore

the effect of month of birth on the persistence of ADHD. I had already
asked for your help a few months ago when I was writing the protocol. We
have since completed our systematic review and started to include data

from

different cohorts. As the month of birth is sensitive data, we do not ask
the authors to send us the raw data: we have constructed an R-script that
we send to the authors and which performs the analyses automatically and
shares the anonymised results. We then carry out a classic two-stage
meta-analysis based on summary results.

We are facing a new challenge that we did not anticipate. Several studies
involve complex survey design. Some studies have clusters (e.g., twin
cohorts or assessments of several regular siblings per family), while
others have even more complex sampling (and include for example sampling
weights, stratum or finite population correction (fpc)). Some studies
include both (clusters + stratum/weights/fpc).

To analyse the data with clustering, naturally we thought of using mixed
models via the glmer function of lme4 (our VD is binary: ADHD persistence
yes/no). However, lme4 does not allow to handle - for the moment -

sampling

weights or stratifications. Therefore, for all data with clustering and/or
weights and/or stratum and/or fpc, our idea was to use only the svyglm
function of the survey package in order to have a coherent group of
analyses (we know that the glmer and svyglm functions do not use the same
coefficients (marginals vs. conditionals)).

Our question is the following: can we group within the same meta-analysis
coefficients that come from standard logistic regressions and coefficients
that come from generalised mixed models fitted using glmer or generalised
linear models adapted to complex designs fitted using svyglm?

To support our question, we performed some tests on a dataset including
clusters and sampling weights. Here are the results :

[...]

As you can see, the results are almost the same from the models, except
when we take into account sampling weights. I hope that our problem is
clearly exposed

Thank you very much in advance for your help!

Corentin J Gosling

Wolfgang Viechtbauer

Fri, Mar 5, 2021 3:07 AM #

Hi Corentin,

I did not mean to suggest that one should run several different analyses on a single dataset. That would indeed place too much of a burden on the authors of the individual studies.

My suggestion is really about this part:

I cannot tell you if is safe or not. But what you can always do is combine these different types in a single analysis and then check if there are systematic differences between these two types of effect sizes. If there are no systematic differences, then this is (empirical) evidence that combining them is in some sense an acceptable thing to do.

This approach is similar to checking if effect sizes extracted from published articles are systematically different from those extracted from unpublished sources in a meta-analysis. If there are systematic differences, we need to think about what the reason for the difference may be. If not, then this is one less thing to worry about.

Best,
Wolfgang

-----Original Message-----
From: GOSLING Corentin [mailto:corentin.gosling at gmail.com]
Sent: Friday, 05 March, 2021 10:33
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] IPD meta analysis / complex survey design

Dear Prof Viechtbauer,

Thank you very much for your reply!

Sorry, my question was a bit misleading. In line with your suggestion, our aim is
to avoid merging ?marginal ?coefficients and ?conditional? coefficients by using
only the svyglm function as soon as the data has a complex structure (clustering
and/or weighting, etc...).

You are entirely right, in situations with clustering only, we could compare 3
approaches : (i) select only 1 individual per cluster and use glm function, or
keep clustering and use (ii) glmer function or (iii) svyglm function. However,
we?are a bit reluctant to make these comparisons for two reasons. First,?as soon
as data have a more complex structure (e.g. sampling weights), the only approach
allowing to take this into account is the svyglm function. This makes comparisons
a bit strange, as in our examples, since one analysis is taking account of some
specificity?of the design while the others are not. Second,?from a practical point
of view,?the burden on authors will become even more complicated as the time
required for analysis is already sometimes quite long (in particular because of
several multiple imputation models). We are concerned that the multiplication of
tests may sometimes make the analysis time so long that it may discourage some
authors from participating.

Our question was whether - within the same meta-analysis - we could
"safely"?include effect sizes estimated by a standard logistic regression (when
data have a regular structure) +? effect sizes?estimated by the svyglm function
(when the data have a complex structure). By safely, I mean?without having to
compare the results of the svyglm function to other functions (such as glm or
glmer) when data have a complex structure.

If this is not possible, a more anecdotal?question was whether it is possible to
"safely" include? effect sizes?estimated by a? standard?logistic regression?(when
data have a regular structure)?+ effect sizes?estimated by the glmer function
(when data have clustering).

Thank you so much?for your help!

Best wishes
Corentin Gosling

Le?ven. 5 mars 2021 ??09:32, Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> a ?crit?:
Dear Corentin,

I cannot answer your question directly, that is, to what extent those results are
comparable to each other, although if svyglm() gives 'marginal' (population
averaged) coefficients in the sense of what a GEE model would do, then one could
argue that those should not be combined with 'conditional' coefficients that
glmer() provides (searching for combinations of terms like "GEE, marginal,
population averaged, logistic mixed-effects, conditional, subject-specific" should
turn up relevant discussions / papers).

But leaving this aside, one could also just approach this issue entirely
empirically, that is, simply code the type of analysis / type of coefficient for
each study and examine in a moderator analysis whether there are systematic
differences between the different types.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of GOSLING Corentin
Sent: Thursday, 04 March, 2021 11:29
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] IPD meta analysis / complex survey design

Dear all

I come back to you about the IPD meta-analysis we are conducting to explore
the effect of month of birth on the persistence of ADHD. I had already
asked for your help a few months ago when I was writing the protocol. We
have since completed our systematic review and started to include data from
different cohorts. As the month of birth is sensitive data, we do not ask
the authors to send us the raw data: we have constructed an R-script that
we send to the authors and which performs the analyses automatically and
shares the anonymised results. We then carry out a classic two-stage
meta-analysis based on summary results.

We are facing a new challenge that we did not anticipate. Several studies
involve complex survey design. Some studies have clusters (e.g., twin
cohorts or assessments of several regular siblings per family), while
others have even more complex sampling (and include for example sampling
weights, stratum or finite population correction (fpc)). Some studies
include both (clusters + stratum/weights/fpc).

To analyse the data with clustering, naturally we thought of using mixed
models via the glmer function of lme4 (our VD is binary: ADHD persistence
yes/no). However, lme4 does not allow to handle - for the moment - sampling
weights or stratifications. Therefore, for all data with clustering and/or
weights and/or stratum and/or fpc, our idea was to use only the svyglm
function of the survey package in order to have a coherent group of
analyses (we know that the glmer and svyglm functions do not use the same
coefficients (marginals vs. conditionals)).

Our question is the following: can we group within the same meta-analysis
coefficients that come from standard logistic regressions and coefficients
that come from generalised mixed models fitted using glmer or generalised
linear models adapted to complex designs fitted using svyglm?

To support our question, we performed some tests on a dataset including
clusters and sampling weights. Here are the results :

[...]

As you can see, the results are almost the same from the models, except
when we take into account sampling weights. I hope that our problem is
clearly exposed

Thank you very much in advance for your help!

Corentin J Gosling

GOSLING Corentin

Fri, Mar 5, 2021 3:35 AM #

Dear Prof Viechtbauer,

Thank you so much for your very clear answer.

We really like this solution. As soon as we have completed the
meta-analysis, I will keep you updated on the results.

 Again, thank you so much for your insights

Corentin Gosling


Le ven. 5 mars 2021 ? 12:07, Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> a ?crit :

Hi Corentin,

I did not mean to suggest that one should run several different analyses
on a single dataset. That would indeed place too much of a burden on the
authors of the individual studies.

My suggestion is really about this part:

Our question was whether - within the same meta-analysis - we could
"safely" include effect sizes estimated by a standard logistic regression

(when

data have a regular structure) +  effect sizes estimated by the svyglm

function

(when the data have a complex structure).

I cannot tell you if is safe or not. But what you can always do is combine
these different types in a single analysis and then check if there are
systematic differences between these two types of effect sizes. If there
are no systematic differences, then this is (empirical) evidence that
combining them is in some sense an acceptable thing to do.

This approach is similar to checking if effect sizes extracted from
published articles are systematically different from those extracted from
unpublished sources in a meta-analysis. If there are systematic
differences, we need to think about what the reason for the difference may
be. If not, then this is one less thing to worry about.

Best,
Wolfgang

-----Original Message-----
From: GOSLING Corentin [mailto:corentin.gosling at gmail.com]
Sent: Friday, 05 March, 2021 10:33
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] IPD meta analysis / complex survey design

Dear Prof Viechtbauer,

Thank you very much for your reply!

Sorry, my question was a bit misleading. In line with your suggestion,

our aim is

to avoid merging ?marginal ?coefficients and ?conditional? coefficients

by using

only the svyglm function as soon as the data has a complex structure

(clustering

and/or weighting, etc...).

You are entirely right, in situations with clustering only, we could

compare 3

approaches : (i) select only 1 individual per cluster and use glm

function, or

keep clustering and use (ii) glmer function or (iii) svyglm function.

However,

we are a bit reluctant to make these comparisons for two reasons.

First, as soon

as data have a more complex structure (e.g. sampling weights), the only

approach

allowing to take this into account is the svyglm function. This makes

comparisons

a bit strange, as in our examples, since one analysis is taking account

of some

specificity of the design while the others are not. Second, from a

practical point

of view, the burden on authors will become even more complicated as the

time

required for analysis is already sometimes quite long (in particular

because of

several multiple imputation models). We are concerned that the

multiplication of

tests may sometimes make the analysis time so long that it may discourage

some

authors from participating.

Our question was whether - within the same meta-analysis - we could
"safely" include effect sizes estimated by a standard logistic regression

(when

data have a regular structure) +  effect sizes estimated by the svyglm

function

(when the data have a complex structure). By safely, I mean without

having to

compare the results of the svyglm function to other functions (such as

glm or

glmer) when data have a complex structure.

If this is not possible, a more anecdotal question was whether it is

possible to

"safely" include  effect sizes estimated by a  standard logistic

regression (when

data have a regular structure) + effect sizes estimated by the glmer

function

(when data have clustering).

Thank you so much for your help!

Best wishes
Corentin Gosling

Le ven. 5 mars 2021 ? 09:32, Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> a ?crit :
Dear Corentin,

I cannot answer your question directly, that is, to what extent those

results are

comparable to each other, although if svyglm() gives 'marginal'

(population

averaged) coefficients in the sense of what a GEE model would do, then

one could

argue that those should not be combined with 'conditional' coefficients

that

glmer() provides (searching for combinations of terms like "GEE, marginal,
population averaged, logistic mixed-effects, conditional,

subject-specific" should

turn up relevant discussions / papers).

But leaving this aside, one could also just approach this issue entirely
empirically, that is, simply code the type of analysis / type of

coefficient for

each study and examine in a moderator analysis whether there are

systematic

differences between the different types.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:

r-sig-meta-analysis-bounces at r-project.org] On

Behalf Of GOSLING Corentin
Sent: Thursday, 04 March, 2021 11:29
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] IPD meta analysis / complex survey design

Dear all

I come back to you about the IPD meta-analysis we are conducting to

explore

the effect of month of birth on the persistence of ADHD. I had already
asked for your help a few months ago when I was writing the protocol. We
have since completed our systematic review and started to include data

from

different cohorts. As the month of birth is sensitive data, we do not ask
the authors to send us the raw data: we have constructed an R-script that
we send to the authors and which performs the analyses automatically and
shares the anonymised results. We then carry out a classic two-stage
meta-analysis based on summary results.

We are facing a new challenge that we did not anticipate. Several studies
involve complex survey design. Some studies have clusters (e.g., twin
cohorts or assessments of several regular siblings per family), while
others have even more complex sampling (and include for example sampling
weights, stratum or finite population correction (fpc)). Some studies
include both (clusters + stratum/weights/fpc).

To analyse the data with clustering, naturally we thought of using mixed
models via the glmer function of lme4 (our VD is binary: ADHD persistence
yes/no). However, lme4 does not allow to handle - for the moment -

sampling

weights or stratifications. Therefore, for all data with clustering

and/or

weights and/or stratum and/or fpc, our idea was to use only the svyglm
function of the survey package in order to have a coherent group of
analyses (we know that the glmer and svyglm functions do not use the same
coefficients (marginals vs. conditionals)).

Our question is the following: can we group within the same meta-analysis
coefficients that come from standard logistic regressions and

coefficients

that come from generalised mixed models fitted using glmer or generalised
linear models adapted to complex designs fitted using svyglm?

To support our question, we performed some tests on a dataset including
clusters and sampling weights. Here are the results :

[...]

As you can see, the results are almost the same from the models, except
when we take into account sampling weights. I hope that our problem is
clearly exposed

Thank you very much in advance for your help!

Corentin J Gosling