Skip to content

good method of removing outliers?

4 messages · Michael, Joshua Wiley, Peter Langfelder +1 more

#
Hi Michael,

I'm afraid this is one of those cases where the short answer is "No"
and the long answer is, "No."

If you are working with a data set stored in a data frame, something like:

sapply(mtcars, function(x) if (is.numeric(x)) range(x, na.rm = TRUE)
else c(NA, NA))

should give you the range for all numeric variables---which is a
simple check if any values fall outside the possible range (say you
have an age variable with a -3 or 320).  Beyond that, you can inspect
data visually, but ultimately, you have to decide what an outlier is
and justify it.

Cheers,

Josh
On Fri, Dec 30, 2011 at 9:03 AM, Michael <comtech.usa at gmail.com> wrote:

  
    
#
On Fri, Dec 30, 2011 at 9:03 AM, Michael <comtech.usa at gmail.com> wrote:
What kind of data do you have? For simple numeric data, there are
various methods for removing outliers developed for robust estimation
and I'm sure they are implemented in R. For example, this link

http://www.unt.edu/benchmarks/archives/2001/december01/rss.htm

describes how to calculate a robust measure of correlation that
includes a method to downweigh (or remove) outliers.

For identifying outlier samples in multivariate setting, the
possibilities are even more varied, from simple hierarchical
clustering and visual identification of outliers to network
connectivity methods etc.

HTH,

Peter
#
On 30/12/11 17:03, Michael wrote:
Ignoring the moral questions for a moment (totaly depends on your 
defintion of an outlier, your dataset, it's distribution etc etc), for 
the technical implementation, try the outliers package 
(http://www.stats.bris.ac.uk/R/web/packages/outliers/index.html), which 
implements the Grubbs and Cox tests.  Also, see this stackoverflow 
answer of mine that shows an implementation of the Llund test for 
outliers within a regression ( http://stackoverflow.com/a/1444548/74658 ).

Regards,

Paul.