Skip to content

[R-meta] Question on effect sizes

3 messages · Lukasz Stasielowicz, David Pedrosa, Wolfgang Viechtbauer

#
Hi,

a couple of ideas that may be obvious to you but the provided 
description is rather short, so I don't know whether you have thought 
about the following points:

1. Did you try to contact the authors of the studies? Maybe they will be 
willing to provide the missing statistics or the data set. The 
willingness varies obviously between researchers (and research areas) 
but it is often worth the effort.

One could contact the corresponding author and ask for the statistics or 
the data set (providing the choice can increase the success rate). If 
you don't receive an answer within several days (e.g. one week) thwn one 
can try to contact the other authors. Recently I used this strategy for 
two different meta-analyses and approximately 80% - 90% of the research 
teams wrote back. Obviously, not all of them could provide answers or 
data (hard drive failure etc.) but approximately 30% - 50% of the 
authors provided additional information.

2. If you have already explored the first strategy and the relevant 
information is still missing, then one could try to reconstruct it. It 
is something that you were referring to but the description is rather 
short, so I cannot infer what is meant by pooled SD etc.
One could try to rearrange the formulas to compute the missing 
information manually but if there are two unknowns (e.g. SD and M for 
one group is missing) then it is not possible.
Nevertheless, one could try to make some guesstimates (e.g. are the SDs 
for both groups in other studies similar? if yes than one could make a 
respective guesstimate for the missing information) in order to impute 
the data.
One could even make several guesstimates and test these different 
scenarios to test the robustness of the findings. Another sensitivity 
analysis would be to compare meta-analytic results based on studies with 
without missing information and the scenarios with guesstimates.

3. It is probably obvious to you but dropping the studies with missing 
information is also a possibility. However, it could bias the results 
(if the dropped studies differ significantly from the included studies).


Hope it helps!

Best wishes,
1 day later
#
Hi Lukas and list,

first of all, thanks for your time and the suggestions and apologies for 
not making my point as clear as I should have. I have already contacted 
all authors and the pharmaceutical companies, but the latter are kind of 
reluctant to disclose results for all kinds of reasons and the former 
have sometime no access anymore since some results are 20 or more years 
old. But let me delineate my problem:

We have a measure of disease severity, which consists of several items 
and is summarised. In most of the studies I have a sum score 
group_mean(x) - so x1+x2 - with group_sd(x). But it may happen that 
authors provide group_mean(x1) and group_mean(x2)? with their respective 
sd. It's of course easy to get the group_mean(x) but I'm wondering what 
the approach would be for sd(x). I though about the "pooled_sd" with

pooled_sd <- sqrt(((n1-1)*sd_x1^2 + (n2-1)*sd_x2^2) / (n1+n2-2)))

but I'm not sure whether that makes sense. So I tried to simulate data 
to get a hunch of how reliable results are (code below), but the mean 
difference between "true" sd and estimated sd is in a few cases 
considerable. So I was wondering if I am missing something/if this is a 
valid approach.

I would be delighted if you or someone else could guide me with some 
advice.

All the best,

David


Code:

## Test for simulation of compund SD

# General
set.seed(1234)
rnorm2 ??? ? <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
nsim ??? ??? <- 500
group_size?? <- c(100, 100)

# Simulate two known datasets
means_x1 ??? <- runif(nsim, 0, 5) # values are be between 0 and 5
sd_x1 ??? ?? <- runif(nsim, 0, 4)

means_x2 ??? <- runif(nsim, 0, 5)
sd_x2 ??? ?? <- runif(nsim, 0, 4)

x1 ??? ??? ? <- matrix(data=NA,nrow=group_size[1],ncol=500)
x2 ??? ??? ? <- matrix(data=NA,nrow=group_size[2],ncol=500)
for (i in 1:500){
 ?? ?x1[,i] <- rnorm2(group_size[1], means_x1[i], sd_x1[i])
 ?? ?x2[,i] <- rnorm2(group_size[2], means_x2[i], sd_x2[i])
}

mean_sum?? ? <- apply(rbind(x1,x2), 2, mean) #trivial see also 
plot(apply(rbind(x1,x2), 2, mean) - apply(rbind(means_x1, means_x2), 2, 
mean)) for estimation differences
sd_sum ????? <- apply(rbind(x1,x2), 2, sd) # "ground truth"

sd_estimate <- rep(NA, nsim) # according to rnorm2
for (i in 1:500){
 ?? ?sd_estimate[i] <- sqrt(((group_size[1]-1)*sd_x1[i]^2 + 
(group_size[2]-1)*sd_x2[i]^2) / (group_size[1]+group_size[2]-2))
}
results <- data.frame(x=sd_sum, y=sd_estimate, z=sd_sum-sd_estimate)
plot(results$z)


Am 21.01.22 um 17:25 schrieb Lukasz Stasielowicz:
4 days later
#
Dear David,

I haven't looked at your post in detail, but I think you might be after this:

# Suppose we have the mean, SD, and size of several subgroups, but we
# need the mean and SD of the total/combined groups. Code below shows
# what we need to compute to obtain this.

# simulate some data
n.total <- 100
grp <- sample(1:4, size=n.total, replace=TRUE)
y   <- rnorm(n.total, mean=grp, sd=2)

# means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

# want to get mean and SD of the total group
mean(y)
sd(y)

# mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

# SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

# check that we get the right values
m.total
sd.total

Best,
Wolfgang