This is due to the way the estimates are weighted, which is a consequence
of the variance components. In the full data, all estimates receive roughly
equal weights:
weights(res.ml, type="rowsum")
On the other hand, when you remove experiments 1 or 2, the VCs change such
that the weights also change noticeably:
weights(sub, type="rowsum")
Now, experiment 1 or 2 (whichever is still in the subset) receives more
weight and so does experiment 3. So, removal of experiment 2 leads to more
weight for experiment 1 which also has an extreme estimate, which affects
the pooled estimate considerably. Hence, the large Cook's distance.
Best,
Wolfgang
-----Original Message-----
From: Antonina Dolgorukova [mailto:an.dolgorukova at gmail.com]
Sent: Saturday, 05 February, 2022 9:55
To: Viechtbauer, Wolfgang (SP)
Cc: Reza Norouzian; R meta
Subject: Re: [R-meta] possible miscalculation of Cook?s distances
Dear Wolfgang,
Thank you very much for confirming the calculations!! Tha's great that
correct! Actually, by "miscalculation", I have rather meant the order,
numbers.
The thing that confuses me a lot is that according to forest plot
standardized (deleted) residuals, the 1st experiment is an outlier. But
to Cook?s distance plot, the 2nd experiment is influential. Looking at
plot, it is really hard to believe that the removal of the 2nd experiment
affect the pooled estimate (its mean is close to the overall effect
CI does overlap (almost included) with CI of the overall effect).
Sincerely,
Antonina
On Fri, Feb 4, 2022 at 7:17 PM Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Just as a brief follow-up with respect to the calculations: They are
Continuing with your code, Antonina, we can easily manually compute the
distances here:
cooksd[1:2]
sub <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat, subset=-1)
(coef(res.ml) - coef(sub))^2 / vcov(res.ml)
sub <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat, subset=-2)
(coef(res.ml) - coef(sub))^2 / vcov(res.ml)
As you will see, these match up.
With these data, removal of a single row can have a large impact on the
components, which can lead to considerable changes in the pooled estimate.
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:
r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Reza Norouzian
Sent: Friday, 04 February, 2022 8:01
To: Antonina Dolgorukova
Cc: R meta
Subject: Re: [R-meta] possible miscalculation of Cook?s distances
Dear Antonina,
There was no rationale, I just wanted to indicate that there is no bug
in the function (and save myself a tiny bit of time). The default
reestimate = TRUE is certainly to be preferred given that your model
does include random-effects and, as the documentation correctly
mentions, the influence of each effect estimate on the estimates of
heterogeneity and correlation can only be examined, if you set
reestimate = TRUE.
For instance, if you remove the random-effects from your initial model
(res.ml), then, you'll see that the use of reestimate = TRUE or FALSE
has no effect. In both cases, only the effect estimate associated with
study 1, experiment 1 is influential. Thus, this shows that the use of
reestimate = FALSE in your initial model essentially ignored the
influence of each effect estimate on the estimates of heterogeneity.
In models with complex random-effects structure esp. fit to large
datasets, setting reestimate = TRUE would take a good chunk of time.
In such cases, some (myself included) may be tempted to set reestimate
= FALSE hoping that the approximation will be close enough.
Kind regards,
Reza
res.ml2 <- update.rma(res.ml, random = NULL)
cook_with_reest2 <- cooks.distance.rma.mv(res.ml2)
cook_without_reest2 <- cooks.distance.rma.mv(res.ml2, reestimate =
plot(cook_with_reest2,type="b")
plot(cook_without_reest2,type="b")
On Fri, Feb 4, 2022 at 12:19 AM Antonina Dolgorukova
<an.dolgorukova at gmail.com> wrote:
Dear Reza,
Thank you very much for your reply. I do understand that not all
to be influential and that not all influential cases have to be
thank you for mentioning this and for the code provided. I did notice
reestimate = FALSE. This raises two questions, actually. It would be
1) Is there any explanation why cooks.distance.rma.mv(res.ml) and
cooks.distance.rma.mv(res.ml , reestimate = FALSE) give completely
results about the experiments 1 and 2 (and very similar for the remaining
experiments)?
According to cooks.distance.rma.mv(res.ml), the Study 1, Experiment 2
according to cooks.distance.rma.mv(res.ml , reestimate = FALSE),
Experiment 1 is influential
#the code illustrating my question
dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
yi=c( 68, 18, 31,20,10,26),
vi=c(60,32, 15, 19, 41, 82))
res.ml <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat,
slab = paste("Study ", study,", ", "Experiment ",
cook_with_reest <- cooks.distance.rma.mv(res.ml)
cook_without_reest <- cooks.distance.rma.mv(res.ml, reestimate =
plot(1:length(cook_with_reest), cook_with_reest, ylim=c(0,2), type="o",
xlab="Study and Experiment ID", ylab="Cook's Distance")
points(cook_without_reest, type="o", pch=19, col = "red")
axis(1, 1:length(cook_with_reest), labels=names(cook_with_reest))
abline(h=4/length(cook_with_reest), lty="dotted", lwd=2)
legend("top", pch=19, col=c("black","red"), lty="solid",
legend=c("reestimate = TRUE","reestimate = FALSE"), bty="n")
2) And could you please explain what is the rationale to set
FALSE? According to the metafor documentation:
"Doing so only yields an approximation to the Cook?s distances that
influence of the ith case on the variance/correlation components"
Sincerely,
Antonina
On Fri, Feb 4, 2022 at 3:29 AM Reza Norouzian <rnorouzian at gmail.com>
Dear Antonina,
An effect estimate whose standardized deleted residual falls beyond
+/-1.96 doesn't necessarily need to have a cook's distance
(influence), and/or a hat value (if additional moderators are used)
that are likewise extreme.
As a general proposition, it may be methodologically more reasonable
to simultaneously consult all these indices to flag an extreme effect
estimate.
That said, in your case, it does seem that the estimate from study 1,
experiment 1 has a large standardized deleted residual as well as a
large Cook's distance relative to that of other effect estimates. So,
I don't think there is any bug in any of the metafor functions you
used.
Below is a bit of code to catch that outlying effect estimate.
Kind regards,
Reza
dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
yi=c( 68, 18, 31,20,10,26),
vi=c(60,32, 15, 19, 41, 82))
res.ml <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat,
slab = paste("Study ", study,", ", "Experiment ",
experiment, sep = ""))
dat <- transform(dat, obs = res.ml$slab)
outlier_limit <- qnorm(.975)
cook <- cooks.distance.rma.mv(res.ml,
reestimate = FALSE)
st_del_res_z <- rstudent.rma.mv(res.ml,
reestimate = FALSE)$z
cook_limit <- max(mean(range(cook)), boxplot.stats(cook, coef =
i <- abs(st_del_res_z) > outlier_limit
k <- cook > cook_limit
oo <- which(i & k)
LL <- names(oo)
removed <- subset(dat, obs %in% LL)
new_dat <- subset(dat, !obs %in% LL)
On Thu, Feb 3, 2022 at 3:36 PM Antonina Dolgorukova
<an.dolgorukova at gmail.com> wrote:
Dear Dr. Viechtbauer and all,
I have a multilevel data structure (experiments nested within
use rma.mv to calculate an overall effect estimate. The next step
sensitivity analysis on the experiment-level data. For detecting
I've used standardized (deleted) residuals and for detecting
experiments I've used Cook?s distance. However, the last test
contradictory results.
The forest plot indicates that the 1st experiment may be an
standardized (deleted) residuals confirm this. But according to
distance plot, the 2nd experiment is influential. It seems that
be a miscalculation of Cook?s distance since I can easily reproduce
issue (in one study one of the experiments have to provide a much
than the other) also if use the model with random = ~ 1 |
1st experiment is influential, not the second.
Could you please clarify is this a bug or a feature of the
function? Maybe it does not work properly with rma.mv objects?
## Reproducible Example
# data frame
dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
yi=c( 68, 18, 31,20,10,26),
vi=c(60,32, 15, 19, 41, 82))
# multilevel model
res.ml <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat,
slab = paste("Study ", study,", ", "Experiment ",
experiment, sep = ""))
# forest plot examination indicates that the 1st experiment may be
outlier
forest(res.ml,
header = "Study and Experiment ID")
# standardized (deleted) residuals confirm this
rst <- rstudent(res.ml)
plot(NA, NA, xlim=c(1, res.ml$k), ylim=c(-3,5),
xlab="Study and Experiment ID", ylab="Standardized (Deleted)
xaxt="n", las=1)
axis(side=1, at=1:res.ml$k, labels=rst$slab)
abline(h=c(-1.96,1.96), lty="dotted")
abline(h=0)
points(1:res.ml$k, rst$z, type="o", pch=19)
# according to Cook?s distance plot, the 2nd experiment is
cooksd <- cooks.distance(res.ml)
plot(1:length(cooksd), cooksd, ylim=c(0,2), type="o", pch=19, las=1,
xaxt="n",
xlab="Experiment ID", ylab="Cook's Distance")
axis(1, 1:length(cooksd), labels=names(cooksd))
abline(h=4/length(cooksd), lty="dotted", lwd=2)
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
[1] metafor_3.1-43 metadat_1.0-0 Matrix_1.3-4
--
Antonina Dolgorukova, M.D.
Pavlov First Saint Petersburg State Medical University
Lev Tolstoy str., 6-8, 197022
Saint Petersburg, Russia