Skip to content

[R-meta] About whether to delete the outliers from the dataset

3 messages · Nick Chen, Michael Dewey, Reza Norouzian

#
I have a question concerning whether to delete some of the data or not. I
have a dataset of 56 studies with a pooled effect size of g=1.25. Yet,
there are 4 data that reported an incredibly high effect size (8.15, 6.63,
4.14, 4.10 respectively). Statistically, they should be considered as
outliers and be removed from the dataset. But since these data went through
the inclusion and exclusion criteria, they should be staying in the dataset
since they met all the requirements of my selection. So if we excluded the
4 data, wouldn't that be miss-reporting some data in the dataset? What
should I do? Should I excluded the 4 seemingly influential cases or keep
them for a complete list of research?
*Name*: Nick Chen (Ping-Cheng, Chen)
*School*:National Taiwan Normal University (NTNU) English Department
(Master)
*Email*: wow99308008 at gmail.com
*Phone number*: +886 909 663 963
#
Dear Nick

Apart from the two options you outline (include all, exclude four) I 
assume you have already investigated whether these four studies share 
some common feature which might explain the differences.

I would suggest presenting the full analysis as your main one and then 
presenting the one excluding the four as a sensitivity analysis. If the 
scientific conclusions are unaltered then your discussion is much 
simpler but if excluding them leads to a different conclusion then your 
discussion section needs to provide some suggestion about what is going on.

I think presenting the analysis excluding the four as the main analysis 
is less preferable and, of course, just reporting that analysis and 
ignoring the four altogether is clearly wrong (I know you did not 
suggests that).

Michael
On 10/12/2023 06:32, Nick Chen via R-sig-meta-analysis wrote:

  
    
#
Dear Nick,

It may be useful to add some additional context to your question for better
assistance.

For example, do you have a multilevel data structure where each study could
have multiple rows or instead you have allowed only one row for each study
in your dataset?

Also, can you possibly describe your method of outlier detection? For
instance, if you are using the metafor package, have you looked at the
combination of cooks.distance(), hatvalues(), and rstudent() for those
large effects in your meta-regression model?

Additionally, I wonder what happens to your pooled effect's standard error
(or the width of the pooled effect's confidence interval [CI]) with versus
without those large effects? For example, does the width of the CI
substantially (ex. by ~30%) decrease after removing those large effects,
increase, or remain largely unchanged?

Finally, depending on how much this matters to you in terms of your study
objectives, does retaining versus removing those large effects in your
meta-regression model change the statistical significance of your pooled
effect at all (i.e., sig. to not sig., or vice versa)?

Reza


On Sun, Dec 10, 2023 at 12:33?AM Nick Chen via R-sig-meta-analysis <
r-sig-meta-analysis at r-project.org> wrote: