Skip to content

how to identify the outliers

1 message · Liaw, Andy

#
First, a bit of retoric:

Rejecting data points solely on the ground of being "statistical outliers"
probably should be outlawed.  (Sounds like that's what you're trying to do.)
You need to investigate these data points so that you understand the reason
for their "outlyingness" before you decide whether their exclusion make
sense or not.  Exclusion based purely on statistical criteria almost
guarantee irreproducible research.

Some explanation:  Any "statistical outliers" are with reference to a model
(formal or conceptual).  They reflect lack of fit of the model to the data.
Rejecting these data points on statistical ground means you believe the
model more than the data, which not such a good idea for scientific
research.

The "outliers" indicated by boxplots are based on a criterion (something
like the upper/lower hinges +/- k*IQR, where k is either 1.5 or 3, see
?boxplot.stats for some definitions).  Actually boxplot.stats gives you the
limits boxplot() used to identify outliers.

Andy

-----Original Message-----
From: Rado Bonk [mailto:rbonk at host.sk]
Sent: Tuesday, November 26, 2002 10:36 AM
To: r-help at stat.math.ethz.ch
Subject: [R] how to identify the outliers


Hello R-users,

Is there any more sophisticated way how to identify the dataset 
outliers other then seeing them in boxplot? I wanna exclude them from
further analysis and I am interested in their position in my vector
data.

Rado