Skip to content

[R-meta] Questions about the use of metaprop for the pooling of proportions

12 messages · Michael Dewey, Gerta Ruecker, Thiago Roza +1 more

#
Dear all,

I am conducting a meta-analysis about characteristics of suicide
deaths in post-mortem studies. My aim is to describe pooled
proportions of key characteristics (biological sex, suicide site,
race, marital status, suicide method, the proportion of substance use
near death, proportion of psychiatric diagnosis prior to death, etc)
across the included studies. Initially, I thought that "metaprop" from
the package "meta" would be enough to pool all these proportions
across included studies. Nevertheless, some of these variables have
more than one category (i.e. suicide method has more than 10
categories: such as hanging, firearm, poisoning, etc), and the pooling
of the proportion of each suicide method separately produces results
which when summed up give more than 100% for the summed proportion of
all suicide methods. Therefore, my first question is: is it possible
to pool all those proportions using "metaprop"? If yes, could anyone
give an example about the coding for the pooling of proportions in the
case of suicide methods? If not, is there any other package that would
allow me to pool the aggregate proportion of suicide methods?

Thank you,

Thiago Roza
1 day later
#
Dear Thiago

What you have is compositional data which might prove a useful search 
term. A common way to analyse such data is by taking the ratios of the 
components to a reference one and then taking logs. However that is 
about the sum total of my knowledge of compositional data analysis and 
as far as I know there is no extant R package which deals with it. 
Others on the list may have better ideas.

For future reference if you post on CrossValidated it is best to put a 
link in each of them so people can check if it has already been answered 
in the other place.

Michael
On 06/03/2022 16:36, Thiago Roza wrote:

  
    
  
#
Dear Michael,

Thank you for your reply!

Do you think it would be possible to generate pooled proportions for
at least the most commonly reported suicide method in this case? (I
would organize my dataset in the following format: "suicide by
hanging" vs "other method of suicide", only two categories).

Thank you,

Thiago

Em seg., 7 de mar. de 2022 ?s 13:40, Michael Dewey
<lists at dewey.myzen.co.uk> escreveu:
#
Dear Thiago, dear Michael,

I read this thread and I still am not clear about the nature of the 
data. Are these really compositional data, or simple proportions? The 
difference is:

  * Compositional data are characterized by lacking a denominator (no
    "n", no sample size). For each study, you have only percentages that
    add to 100%. Such data occur in microbioma research (percentages of
    species in the microbioma).
  * By contrast, proportions are given as r (number of events) and n
    (sample size, i.e., number of persons/patients/trials/whatever), or
    as percentages and n.

If you have proportions, you may use metaprop. If you have compositional 
data, as Michael supposed, you cannot.

Best,

Gerta

Am 08.03.2022 um 12:34 schrieb Thiago Roza:

  
    
#
Dear Gerta,

Thank you for your reply!
In my systematic review, I have several cross-sectional original
studies. In each one of these original studies I have a sample size (n
for the total number of suicide cases included in the study), and this
number is also classified according to the suicide method (for
instance, if n is 100 for the total number of cases, 80% or 80 cases
died due to hanging, 10 or 10% died due to firearms, 5 or 5% died due
to drug overdose, 3 or 3% died due to pesticides, and so on). The same
example applies to other variables such as biological sex, race,
suicide site, etc.
The idea of my analysis is to pool the proportions of several key
characteristics, including suicide methods, across all included
studies, so I can report the proportions with 95%CI in the paper.
I tried using "metaprop" for the pooling of the proportions of suicide
methods, however, when I summed up the pooled proportions, when using
the "Inverse" method the sum would give more than 100%, and when using
the "GLMM" method it would give less than 100%.

That is why I was wondering if it was possible to pool those
proportions using "metaprop". If yes, is it OK for the summed pooled
proportions to be different than 100%?

Thank you,

Thiago

Em ter., 8 de mar. de 2022 ?s 09:27, Dr. Gerta R?cker
<ruecker at imbi.uni-freiburg.de> escreveu:
#
Dear Thiago,

So you have proportions of several mutually exclusive outcomes. Of 
course, these are dependent because the sum is always the total numbers 
of cases in the study (corresponding to 100% in that study). 
Nevertheless, I don't see any reason why not pooling each outcome 
separately using metaprop(). In fact, depending on the transformation, 
the resulting average proportion will not generally sum up to 100%, 
particularly not when using no transformation at all. This raises the 
question which transformation to choose. The default in metaprop() is 
random intercept logistic regression model with transformation logit.

I made an observation that I have to think about, and you may try this. 
If I use the default, the sum of the pooled percentages over all 
outcomes is indeed always 1 for the fixed effect estimate. I used code 
like this (here for 3 outcomes):

#### Random data ####
out1 <- rbinom(10,100,0.1)
out2 <- rbinom(10,100,0.5)
out3 <- rbinom(10,100,0.9)
n <- out1 + out2 + out3
m1 <- metaprop(out1, n)
m2 <- metaprop(out2, n)
m3 <- metaprop(out3, n)
plogis(m1$TE.fixed) + plogis(m2$TE.fixed) + plogis(m3$TE.fixed)

(plogis is the inverse of the logit transformation, often called 
"expit": plogis(x) = exp(x)/(1 + exp(x).) These seem to sum up to 1 for 
the fixed effect estimates, but not in general for the random effects 
estimates, only in case of small heterogeneity (which is rarely the case 
with proportions).

I am interested to hear whether this works with your data. (And I have 
to prove that this holds in general ...)

Best,

Gerta


Am 08.03.2022 um 13:42 schrieb Thiago Roza:

  
    
#
Dear Thiago,

I found that, apparently, the result presented by the common effect 
model (=fixed effect model) is simply the sum of all entries/events over 
all studies, divided by the total sample size (summed up over all 
studies). You see this by typing the following after the code in my last 
e-mail:

all.equal(sum(out1)/sum(n), plogis(m1$TE.fixed))
all.equal(sum(out2)/sum(n), plogis(m2$TE.fixed))
all.equal(sum(out3)/sum(n), plogis(m3$TE.fixed))

This means that the method is equivalent to considering the data as a 
contingency table where the rows correspond to the studies and the 
columns to the outcomes. The meta-analytic result corresponds to the 
percentages in the column sums, and of course these add to 100%. In fact 
this is the easiest way to deal with this kind of data.

@Guido, @Wolfgang: I couldn't find thisinformation on the metaprop or 
the rma.glmm help pages. Do you see any problem with interpreting 
Thiago's data as a contingency table? I think that, by contrast to 
pairwise comparison data, confounding/ecological bias is not an issue here.

Best,

Gerta

Am 08.03.2022 um 19:30 schrieb Dr. Gerta R?cker:
#
Hi Gerta,

Under homogeneity, we have X_i ~ Binomial(n_i, pi), in which case sum(X_i) ~ Binomial(sum(n_i), pi) and hence

sum(out1)/sum(n)
plogis(coef(glm(out1/n ~ 1, weights = n, family = binomial)))

or using metaprop() / rma.glmm()

plogis(metaprop(out1, n)$TE.fixed)
plogis(coef(rma.glmm(measure="PLO", xi=out1, ni=n, method="EE")))

are all identical. It goes to show how the logistic regression approach gives an 'exact' model, based on the exact distributional properties of binomial counts.

As for Thiago's data: I think this is fine. But essentially he has multinomial data. I recently described in a post how such data could be addressed if one would want to analyze them all simultaneously:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2022-February/003878.html

Best,
Wolfgang
#
Hi Wolfgang,

Thank you! Indeed I just saw that the ML estimate under the binomial 
model and the assumption of homogeneity gives (sum r_i)/(sum n_i). In 
fact this seems equivalent to logistic regression. Probably it works 
also under the multinomial model, I didn't write this down. I admit that 
I never had thought about this :(

Best,

Gerta

Am 08.03.2022 um 22:58 schrieb Viechtbauer, Wolfgang (SP):

  
    
#
Dear Gerta and Wolfgang,

Thank you for the replies!
The fixed model works just fine for my multinomial data (the sum of
the proportions of all suicide methods is now 100!).
I think that in this case, I will use the random-effects model for the
binomial data in metaprop and the fixed effects model for the
multinomial data!

Thank you for your help!

Thiago



Em ter., 8 de mar. de 2022 ?s 19:06, Dr. Gerta R?cker
<ruecker at imbi.uni-freiburg.de> escreveu:
#
Happy to see other people spending time at 11pm thinking about this kind of stuff :)

If we want to be really precise, the MLE of the logit-transformed true proportion is qlogis((sum r_i)/(sum n_i)) for the logistic regression model with a logit link, but since MLEs are invariant under transformations, so plogis(qlogis((sum r_i)/(sum n_i))) = (sum r_i)/(sum n_i)) is the MLE of the true proportion. In fact, this is neatly demonstrated by fitting the logistic regression with an identity link (do we even call this 'logistic' regression?!?):

coef(glm(out1/n ~ 1, weights = n, family = binomial(link = "identity")))

That all of this happens 'automagically' is really a neat feature of logistic regression.

Best,
Wolfgang
#
Hi all

Am 08.03.2022 um 23:19 schrieb Viechtbauer, Wolfgang (SP):
Yes. That's typical mathematicians' behaviour.
Nice!

Good night then :)

Gerta