Skip to content

How to remove multiple outliers

4 messages · aajit75, R. Michael Weylandt

#
Hi All,

I am working on the dataset in which some of the variables have more than
one observations with  outliers . 

I am using below mentioned sample script 

library(outliers)
x1 <- c(10, 10, 11, 12, 13, 14, 14, 10, 11, 13, 12, 13, 10, 19, 18, 17,
10099, 10099, 10098)
outlier_tf1 = outlier(x1,logical=TRUE)
find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE)
beh_input_ro1 = x1[-find_outlier1]

It removes the outliers which are extrme and not all. In this example it
removes only  10099, 10099 and not 10098.

Thanks for the help in advance.
-Ajit


--
View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3921689.html
Sent from the R help mailing list archive at Nabble.com.
#
Did you read the documentation for ?outlier. It clearly states that it
removes the single (possibly repeated) value with the largest distance
from the mean. That's only 10099 here....you could perhaps apply the
function more than once or write your own outlier removal script using
whatever criterion you want to define outliers, but the function is
doing exactly what it claims to do.

On another note, why complicate things? Just use the rm.outlier()
function of the same package rather than doing it (inefficiently) how
you are currently. Note that outlier() returns a logical vector which
can be used for direct subsetting; that there's no need to test
booleans ==TRUE (since that's an identity transform on the set of
booleans), and that the arr.ind = TRUE call isn't needed here. None of
those make much of a difference for this problem, but they are points
of good practice.

Michael
On Thu, Oct 20, 2011 at 8:11 AM, aajit75 <aajit75 at yahoo.co.in> wrote:
#
Hi Michael,

Thanks for the help.

Yes, I have gone through the document for ?outlier. As it removes one
outlier at a time, being new to R, I was woondering is there any function
available for removing multiple outliers whithout calling say rm.outlier for
n number of time because n is not finite here.

On the second point, I am using below mentioned piece of code, because I am
getting error when rm.outlier with fill = FALSE option is applied on the
same dataset.

outlier_tf1 = outlier(x1,logical=TRUE) 
find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) 
beh_input_ro1 = x1[-find_outlier1]
Error in data.frame(X1 = c(28.7812, 24.8923, 31.3987, 25.774, 27.1798,  : 
arguments imply differing number of rows: 2398, 2390, 2399

Regards,
-Ajit

--
View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3924904.html
Sent from the R help mailing list archive at Nabble.com.
3 days later
#
I believe Dr. Winsemius addressed this in your other thread, but I
would hesitate to do any sort of outlier identification based on
repeated application of a filter (for that matter, I'm not much of an
outlier removal guy generally, but let's suppose I were). You can
easily get into a situation where those data that were not outliers
previously now become so. Rather a single pass filter is probably more
appropriate....even better would be to use some robust methodologies,
such as those made available in library(robustbase).

As to the technical bits of your question,

I can't easily get rm.outlier to throw an error like that: can you
provide a minimum working example that does?

I'm not aware of any simple direct method for repeated function
application other than a loop trick. Perhaps one could rig something
with do.call() but the loop should be fine.

Michael
On Fri, Oct 21, 2011 at 6:40 AM, aajit75 <aajit75 at yahoo.co.in> wrote: