good method of removing outliers?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111230/aec66dc6/attachment.pl>
Hi Michael,

I'm afraid this is one of those cases where the short answer is "No"
and the long answer is, "No."

If you are working with a data set stored in a data frame, something like:

sapply(mtcars, function(x) if (is.numeric(x)) range(x, na.rm = TRUE)
else c(NA, NA))

should give you the range for all numeric variables---which is a
simple check if any values fall outside the possible range (say you
have an age variable with a -3 or 320).  Beyond that, you can inspect
data visually, but ultimately, you have to decide what an outlier is
and justify it.

Cheers,

Josh
Happy holidays all!

I know it's very subjective to determine whether some data is outlier or
not...

But are there reasonally good and realistic methods of identifying outliers
in R?

Thanks a lot!

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
Happy holidays all!

I know it's very subjective to determine whether some data is outlier or
not...

But are there reasonally good and realistic methods of identifying outliers
in R?
What kind of data do you have? For simple numeric data, there are
various methods for removing outliers developed for robust estimation
and I'm sure they are implemented in R. For example, this link

http://www.unt.edu/benchmarks/archives/2001/december01/rss.htm

describes how to calculate a robust measure of correlation that
includes a method to downweigh (or remove) outliers.

For identifying outlier samples in multivariate setting, the
possibilities are even more varied, from simple hierarchical
clustering and visual identification of outliers to network
connectivity methods etc.

HTH,

Peter
Happy holidays all!

I know it's very subjective to determine whether some data is outlier or
not...

But are there reasonally good and realistic methods of identifying outliers
in R?

Thanks a lot!

Ignoring the moral questions for a moment (totaly depends on your 
defintion of an outlier, your dataset, it's distribution etc etc), for 
the technical implementation, try the outliers package 
(http://www.stats.bris.ac.uk/R/web/packages/outliers/index.html), which 
implements the Grubbs and Cox tests.  Also, see this stackoverflow 
answer of mine that shows an implementation of the Llund test for 
outliers within a regression ( http://stackoverflow.com/a/1444548/74658 ).

Regards,

Paul.