[R-meta] possible miscalculation of Cook’s distances
Dear Reza,
Thank you very much for your reply. I do understand that not all outliers
have to be influential and that not all influential cases have to be
outliers. But thank you for mentioning this and for the code provided. I
did notice you set the reestimate = FALSE. This raises two questions,
actually. It would be great if you help with them.
1) Is there any explanation why cooks.distance.rma.mv(res.ml) and
cooks.distance.rma.mv(res.ml , reestimate = FALSE) give completely
different results about the experiments 1 and 2 (and very similar for the
remaining experiments)?
According to cooks.distance.rma.mv(res.ml), the Study 1, Experiment 2 is
influential, but
according to cooks.distance.rma.mv(res.ml , reestimate = FALSE), Study 1,
Experiment 1 is influential
#the code illustrating my question
dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
yi=c( 68, 18, 31,20,10,26),
vi=c(60,32, 15, 19, 41, 82))
res.ml <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat,
slab = paste("Study ", study,", ", "Experiment ",
experiment, sep = ""))
cook_with_reest <- cooks.distance.rma.mv(res.ml)
cook_without_reest <- cooks.distance.rma.mv(res.ml, reestimate = FALSE)
plot(1:length(cook_with_reest), cook_with_reest, ylim=c(0,2), type="o",
pch=19, las=1, xaxt="n",
xlab="Study and Experiment ID", ylab="Cook's Distance")
points(cook_without_reest, type="o", pch=19, col = "red")
axis(1, 1:length(cook_with_reest), labels=names(cook_with_reest))
abline(h=4/length(cook_with_reest), lty="dotted", lwd=2)
legend("top", pch=19, col=c("black","red"), lty="solid",
legend=c("reestimate = TRUE","reestimate = FALSE"), bty="n")
2) And could you please explain what is the rationale to set reestimate =
FALSE? According to the metafor documentation:
"Doing so only yields an approximation to the Cook?s distances that ignores
the influence of the ith case on the variance/correlation components"
Sincerely,
Antonina
On Fri, Feb 4, 2022 at 3:29 AM Reza Norouzian <rnorouzian at gmail.com> wrote:
Dear Antonina,
An effect estimate whose standardized deleted residual falls beyond
+/-1.96 doesn't necessarily need to have a cook's distance
(influence), and/or a hat value (if additional moderators are used)
that are likewise extreme.
As a general proposition, it may be methodologically more reasonable
to simultaneously consult all these indices to flag an extreme effect
estimate.
That said, in your case, it does seem that the estimate from study 1,
experiment 1 has a large standardized deleted residual as well as a
large Cook's distance relative to that of other effect estimates. So,
I don't think there is any bug in any of the metafor functions you
used.
Below is a bit of code to catch that outlying effect estimate.
Kind regards,
Reza
dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
yi=c( 68, 18, 31,20,10,26),
vi=c(60,32, 15, 19, 41, 82))
res.ml <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat,
slab = paste("Study ", study,", ", "Experiment ",
experiment, sep = ""))
dat <- transform(dat, obs = res.ml$slab)
outlier_limit <- qnorm(.975)
cook <- cooks.distance.rma.mv(res.ml,
reestimate = FALSE)
st_del_res_z <- rstudent.rma.mv(res.ml,
reestimate = FALSE)$z
cook_limit <- max(mean(range(cook)), boxplot.stats(cook, coef =
1.5)$stats[5])
i <- abs(st_del_res_z) > outlier_limit
k <- cook > cook_limit
oo <- which(i & k)
LL <- names(oo)
removed <- subset(dat, obs %in% LL)
new_dat <- subset(dat, !obs %in% LL)
On Thu, Feb 3, 2022 at 3:36 PM Antonina Dolgorukova
<an.dolgorukova at gmail.com> wrote:
Dear Dr. Viechtbauer and all, I have a multilevel data structure (experiments nested within studies)
are
use rma.mv to calculate an overall effect estimate. The next step
requires
sensitivity analysis on the experiment-level data. For detecting outliers I've used standardized (deleted) residuals and for detecting influential experiments I've used Cook?s distance. However, the last test provides contradictory results. The forest plot indicates that the 1st experiment may be an outlier, and standardized (deleted) residuals confirm this. But according to Cook?s distance plot, the 2nd experiment is influential. It seems that there may be a miscalculation of Cook?s distance since I can easily reproduce this issue (in one study one of the experiments have to provide a much larger
ES
than the other) also if use the model with random = ~ 1 | experiment, the 1st experiment is influential, not the second. Could you please clarify is this a bug or a feature of the
cooks.distance()
function? Maybe it does not work properly with rma.mv objects?
## Reproducible Example
# data frame
dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
yi=c( 68, 18, 31,20,10,26),
vi=c(60,32, 15, 19, 41, 82))
# multilevel model
res.ml <- rma.mv(yi, vi,
random = ~ 1 | study/experiment,
data=dat,
slab = paste("Study ", study,", ", "Experiment ",
experiment, sep = ""))
# forest plot examination indicates that the 1st experiment may be an
outlier
forest(res.ml,
header = "Study and Experiment ID")
# standardized (deleted) residuals confirm this
rst <- rstudent(res.ml)
plot(NA, NA, xlim=c(1, res.ml$k), ylim=c(-3,5),
xlab="Study and Experiment ID", ylab="Standardized (Deleted)
Residual",
xaxt="n", las=1)
axis(side=1, at=1:res.ml$k, labels=rst$slab)
abline(h=c(-1.96,1.96), lty="dotted")
abline(h=0)
points(1:res.ml$k, rst$z, type="o", pch=19)
# according to Cook?s distance plot, the 2nd experiment is influential
cooksd <- cooks.distance(res.ml)
plot(1:length(cooksd), cooksd, ylim=c(0,2), type="o", pch=19, las=1,
xaxt="n",
xlab="Experiment ID", ylab="Cook's Distance")
axis(1, 1:length(cooksd), labels=names(cooksd))
abline(h=4/length(cooksd), lty="dotted", lwd=2)
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
[1] metafor_3.1-43 metadat_1.0-0 Matrix_1.3-4
--
Antonina Dolgorukova, M.D.
Pavlov First Saint Petersburg State Medical University
Lev Tolstoy str., 6-8, 197022
Saint Petersburg, Russia
[[alternative HTML version deleted]]
_______________________________________________ R-sig-meta-analysis mailing list R-sig-meta-analysis at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
Antonina Dolgorukova, M.D. Pavlov First Saint Petersburg State Medical University Lev Tolstoy str., 6-8, 197022 Saint Petersburg, Russia [[alternative HTML version deleted]]