Hi Lukas and list,
first of all, thanks for your time and the suggestions and apologies for
not making my point as clear as I should have. I have already contacted
all authors and the pharmaceutical companies, but the latter are kind of
reluctant to disclose results for all kinds of reasons and the former
have sometime no access anymore since some results are 20 or more years
old. But let me delineate my problem:
We have a measure of disease severity, which consists of several items
and is summarised. In most of the studies I have a sum score
group_mean(x) - so x1+x2 - with group_sd(x). But it may happen that
authors provide group_mean(x1) and group_mean(x2)? with their respective
sd. It's of course easy to get the group_mean(x) but I'm wondering what
the approach would be for sd(x). I though about the "pooled_sd" with
pooled_sd <- sqrt(((n1-1)*sd_x1^2 + (n2-1)*sd_x2^2) / (n1+n2-2)))
but I'm not sure whether that makes sense. So I tried to simulate data
to get a hunch of how reliable results are (code below), but the mean
difference between "true" sd and estimated sd is in a few cases
considerable. So I was wondering if I am missing something/if this is a
valid approach.
I would be delighted if you or someone else could guide me with some
advice.
All the best,
David
Code:
## Test for simulation of compund SD
# General
set.seed(1234)
rnorm2 ??? ? <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
nsim ??? ??? <- 500
group_size?? <- c(100, 100)
# Simulate two known datasets
means_x1 ??? <- runif(nsim, 0, 5) # values are be between 0 and 5
sd_x1 ??? ?? <- runif(nsim, 0, 4)
means_x2 ??? <- runif(nsim, 0, 5)
sd_x2 ??? ?? <- runif(nsim, 0, 4)
x1 ??? ??? ? <- matrix(data=NA,nrow=group_size[1],ncol=500)
x2 ??? ??? ? <- matrix(data=NA,nrow=group_size[2],ncol=500)
for (i in 1:500){
?? ?x1[,i] <- rnorm2(group_size[1], means_x1[i], sd_x1[i])
?? ?x2[,i] <- rnorm2(group_size[2], means_x2[i], sd_x2[i])
}
mean_sum?? ? <- apply(rbind(x1,x2), 2, mean) #trivial see also
plot(apply(rbind(x1,x2), 2, mean) - apply(rbind(means_x1, means_x2), 2,
mean)) for estimation differences
sd_sum ????? <- apply(rbind(x1,x2), 2, sd) # "ground truth"
sd_estimate <- rep(NA, nsim) # according to rnorm2
for (i in 1:500){
?? ?sd_estimate[i] <- sqrt(((group_size[1]-1)*sd_x1[i]^2 +
(group_size[2]-1)*sd_x2[i]^2) / (group_size[1]+group_size[2]-2))
}
results <- data.frame(x=sd_sum, y=sd_estimate, z=sd_sum-sd_estimate)
plot(results$z)
Am 21.01.22 um 17:25 schrieb Lukasz Stasielowicz:
Hi,
a couple of ideas that may be obvious to you but the provided
description is rather short, so I don't know whether you have thought
about the following points:
1. Did you try to contact the authors of the studies? Maybe they will
be willing to provide the missing statistics or the data set. The
willingness varies obviously between researchers (and research areas)
but it is often worth the effort.
One could contact the corresponding author and ask for the statistics
or the data set (providing the choice can increase the success rate).
If you don't receive an answer within several days (e.g. one week)
thwn one can try to contact the other authors. Recently I used this
strategy for two different meta-analyses and approximately 80% - 90%
of the research teams wrote back. Obviously, not all of them could
provide answers or data (hard drive failure etc.) but approximately
30% - 50% of the authors provided additional information.
2. If you have already explored the first strategy and the relevant
information is still missing, then one could try to reconstruct it. It
is something that you were referring to but the description is rather
short, so I cannot infer what is meant by pooled SD etc.
One could try to rearrange the formulas to compute the missing
information manually but if there are two unknowns (e.g. SD and M for
one group is missing) then it is not possible.
Nevertheless, one could try to make some guesstimates (e.g. are the
SDs for both groups in other studies similar? if yes than one could
make a respective guesstimate for the missing information) in order to
impute the data.
One could even make several guesstimates and test these different
scenarios to test the robustness of the findings. Another sensitivity
analysis would be to compare meta-analytic results based on studies
with without missing information and the scenarios with guesstimates.
3. It is probably obvious to you but dropping the studies with missing
information is also a possibility. However, it could bias the results
(if the dropped studies differ significantly from the included studies).
Hope it helps!
Best wishes,
<http://www.ukgm.de>
PD Dr. David Pedrosa
Leitender Oberarzt der Klinik f?r Neurologie,
Leiter der Sektion Bewegungsst?rungen, Universit?tsklinikum Gie?en und
Marburg
Tel.: (+49) 6421-58 65299 Fax: (+49) 6421-58 67055
Adresse: Baldingerstr., 35043 Marburg
Web: https://www.ukgm.de/ugm_2/deu/umr_neu/index.html
[[alternative HTML version deleted]]