Hi All, I am working on the dataset in which some of the variables have more than one observations with outliers . I am using below mentioned sample script library(outliers) x1 <- c(10, 10, 11, 12, 13, 14, 14, 10, 11, 13, 12, 13, 10, 19, 18, 17, 10099, 10099, 10098) outlier_tf1 = outlier(x1,logical=TRUE) find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) beh_input_ro1 = x1[-find_outlier1] It removes the outliers which are extrme and not all. In this example it removes only 10099, 10099 and not 10098. Thanks for the help in advance. -Ajit -- View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3921689.html Sent from the R help mailing list archive at Nabble.com.
How to remove multiple outliers
4 messages · aajit75, R. Michael Weylandt
Did you read the documentation for ?outlier. It clearly states that it removes the single (possibly repeated) value with the largest distance from the mean. That's only 10099 here....you could perhaps apply the function more than once or write your own outlier removal script using whatever criterion you want to define outliers, but the function is doing exactly what it claims to do. On another note, why complicate things? Just use the rm.outlier() function of the same package rather than doing it (inefficiently) how you are currently. Note that outlier() returns a logical vector which can be used for direct subsetting; that there's no need to test booleans ==TRUE (since that's an identity transform on the set of booleans), and that the arr.ind = TRUE call isn't needed here. None of those make much of a difference for this problem, but they are points of good practice. Michael
On Thu, Oct 20, 2011 at 8:11 AM, aajit75 <aajit75 at yahoo.co.in> wrote:
Hi All, I am working on the dataset in which some of the variables have more than one observations with ?outliers . I am using below mentioned sample script library(outliers) x1 <- c(10, 10, 11, 12, 13, 14, 14, 10, 11, 13, 12, 13, 10, 19, 18, 17, 10099, 10099, 10098) outlier_tf1 = outlier(x1,logical=TRUE) find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) beh_input_ro1 = x1[-find_outlier1] It removes the outliers which are extrme and not all. In this example it removes only ?10099, 10099 and not 10098. Thanks for the help in advance. -Ajit -- View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3921689.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Michael, Thanks for the help. Yes, I have gone through the document for ?outlier. As it removes one outlier at a time, being new to R, I was woondering is there any function available for removing multiple outliers whithout calling say rm.outlier for n number of time because n is not finite here. On the second point, I am using below mentioned piece of code, because I am getting error when rm.outlier with fill = FALSE option is applied on the same dataset. outlier_tf1 = outlier(x1,logical=TRUE) find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) beh_input_ro1 = x1[-find_outlier1]
library(outliers) beh_input_ro <- rm.outlier(beh_input_dr, fill = FALSE, median = FALSE, opposite = FALSE)
Error in data.frame(X1 = c(28.7812, 24.8923, 31.3987, 25.774, 27.1798, : arguments imply differing number of rows: 2398, 2390, 2399 Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3924904.html Sent from the R help mailing list archive at Nabble.com.
3 days later
I believe Dr. Winsemius addressed this in your other thread, but I would hesitate to do any sort of outlier identification based on repeated application of a filter (for that matter, I'm not much of an outlier removal guy generally, but let's suppose I were). You can easily get into a situation where those data that were not outliers previously now become so. Rather a single pass filter is probably more appropriate....even better would be to use some robust methodologies, such as those made available in library(robustbase). As to the technical bits of your question, I can't easily get rm.outlier to throw an error like that: can you provide a minimum working example that does? I'm not aware of any simple direct method for repeated function application other than a loop trick. Perhaps one could rig something with do.call() but the loop should be fine. Michael
On Fri, Oct 21, 2011 at 6:40 AM, aajit75 <aajit75 at yahoo.co.in> wrote:
Hi Michael, Thanks for the help. Yes, I have gone through the document for ?outlier. As it removes one outlier at a time, being new to R, I was woondering is there any function available for removing multiple outliers whithout calling say rm.outlier for n number of time because n is not finite here. On the second point, I am using below mentioned piece of code, because I am getting error when rm.outlier with fill = FALSE option is applied on the same dataset. outlier_tf1 = outlier(x1,logical=TRUE) find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) beh_input_ro1 = x1[-find_outlier1]
library(outliers) beh_input_ro <- rm.outlier(beh_input_dr, fill = FALSE, median = FALSE, opposite = FALSE)
Error in data.frame(X1 = c(28.7812, 24.8923, 31.3987, 25.774, 27.1798, ?: arguments imply differing number of rows: 2398, 2390, 2399 Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3924904.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.