[RsR] [R] "hubers" function in R MASS library : problem andsolution
Hi Marting,? Maheswaran: Thank you very much for your input!? I am applying M-estimation in an automatic data processing procedure (that requires both robust mean and robust SD).? The "mad=0" problem happens rarely.? Most of the time "hubers" works great.? I appreciate your proposed solutions. If we use:
hubers(a, k = 1.5, s=mean(abs(a-median(a))))
Then "s" is fixed and seems less robust:? if there are outliers, "s=mean(abs(a-median(a)))" will appear too large, right? ? Similaryly, using the "robustbase package,
? huberM(a, s = max(mad(a), 1e-6))
"s" will be fixed as well and will be less efficient. Regarding Martin's comment: "but 'tol' is completely arbitrary and, the way you propose it makes the resulting estimate no-longer-scale-equivariant.",?? I did an experiment using different levels of "tol" and found them to give identical results.?? For convenience, I add a "s.min" parameter in "hubers" that bounds the initial "s" estimate from below: ? if (missing(s)) ????? s0 <- max(mad(y), s.min) ? Then I tried the following settings:
hubers(c(rep(1,10), 2), s.min=1e-6)
$mu [1] 1 $s [1] 7.958e-17
hubers(c(rep(1,10), 2), s.min=1)
$mu [1] 1 $s [1] 7.958e-17
hubers(c(rep(1,10), 2), s.min=10)
$mu [1] 1 $s [1] 7.958e-17 #####
hubers(a, s.min=1e-6)
$mu [1] 5.88 $s [1] 2.937
hubers(a, s.min=1)
$mu [1] 5.88 $s [1] 2.937
hubers(a, s.min=10)
$mu [1] 5.88 $s [1] 2.937 However, forgive me for my ignorance about the practical implication of "scale-equivalent". Let me know if you have any comment. Thanks again for your help! Sincerely, Feiming Chen
--- On Sun, 2/6/11, Maheswaran Rohan <mrohan at doc.govt.nz> wrote:
From: Maheswaran Rohan <mrohan at doc.govt.nz> Subject: RE: [RsR] [R] "hubers" function in R MASS library : problem andsolution To: "Martin Maechler" <maechler at stat.math.ethz.ch>, "Feiming Chen" <feimingchen at yahoo.com> Cc: r-sig-robust at r-project.org Date: Sunday, February 6, 2011, 3:23 PM Hi Chen
? With this change,? the result is more sensible:
? ? >> hubers(a) ? ? > $mu ? ? > [1] 5.88 ? ? > $s ? ? > [1] 2.937 This mu = 5.88 is very similar to the mean(a) = 5.877417. That means, no point to apply M-estimation if you are happy with the result. Following suggestion may be helpful, think about it! To overcome mad = 0 problem, try to define s = mean(abs((a - median(a)))
hubers(a, k = 1.5, s=mean(abs(a-median(a))))
$mu [1] 6.414677 $s [1] 1.689561
hubers(a, k = 1.345, s=mean(abs(a-median(a))))
$mu [1] 6.480148 $s [1] 1.689561 Regards Rohan -----Original Message----- From: r-sig-robust-bounces at r-project.org [mailto:r-sig-robust-bounces at r-project.org] On Behalf Of Martin Maechler Sent: Friday, 4 February 2011 10:05 p.m. To: Feiming Chen Cc: r-help at r-project.org; r-sig-robust at r-project.org Subject: Re: [RsR] [R] "hubers" function in R MASS library : problem andsolution
Feiming Chen <feimingchen at yahoo.com> ? ???on Thu, 3 Feb 2011 12:03:05 -0800 (PST) writes:
? ? > Hello: ? ? > I found the "hubers" function in MASS library is NOT working on the following ? ? > data: ? ? >> a <- ? ? >> c(7.19,7.19,7.19,9.41,6.79,9.41,7.19,9.41,1.64,7.19,7.19,7.19,7.19,1.422 ,7.19,6.79,7.19,6.79,7.19,7.19,4.44,6.55,6.79,7.19,9.41,9.41,7.19,7.19,7 .19,7.19,1.64,1.597,1.64,7.19,1.422,7.19,6.79,9.38,7.19,1.64,7.19,7.19,7 .19,7.19,7.19,1.64,7.19,6.79,6.79,1.649,1.64,7.19,1.597,1.64,6.55,7.19,7 .19,1.64,7.19,7.19,1.407,1.672,1.672,7.19,6.55,7.19,7.19,9.41,1.407,7.19 ,7.19,9.41,7.19,9.41,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,9.41,7 .19,6.79,7.19,6.79,1.64,1.422,7.19,7.19,1.67,1.64,1.64,1.64,1.64,1.787,7 .19,7.19,7.19,6.79,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,1.64,1.6 4,1.64,1.422,9.41,1.64,7.19,7.19,7.19,7.19,7.19,7.19,7.19,6.79,6.79,7.19 ,6.79,7.19,7.19,1.407,7.19,4.42,9,1.64,1.64,6.79,1.664,1.664) ? ? >> ? ? >> library(MASS) ? ? >> hubers(a) ? ? > ## NO response! ? ? > I think it is due to the infinite loop caused by the following line in the code ? ? > of "hubers" (around Line 30): ? ? > if ((abs(mu0 - mu1) < tol * s0) && ? ? > abs(s0 - s1) < tol * s0)? break ? ? > where "s0" evaluates to ZERO initially (due to more than 50% of the number ? ? > 7.19). yes. Not only for this reason,? the robustbase? package has had the? 'huberM()' function with some other slight advantages over MASS::huber. ? ? > I propose to change the "<" sign to "<=": ? ? > if ((abs(mu0 - mu1) <= tol * s0) && ? ? > abs(s0 - s1) <= tol * s0)? break ? ? > which will break the infinite loop.? ? However, the new result is: ? ? >> hubers(a) ? ? > $mu ? ? > [1] 7.19 ? ? > $s ? ? > [1] 0 ? ? > which gives 0 standard deviation.? Actually the data does vary and it is not ? ? > true all values other than 7.19 are outliers.??? Sure. Nontheless, the way Peter Huber had defined the "proposal 2" Huber estimator,? s = 0, is the correct result. With the robustbase? huberM() function, you (can) get
huberM(a, warn0scale =TRUE)
$mu [1] 7.19 $s [1] 0 $it [1] 0 Warning message: In huberM(a, warn0scale = TRUE) : ? scale 's' is zero -- returning initial 'mu' ? ? >> plot(a) ? ? > I think this is because we allow initial SD to equal to zero instead of a ? ? > POSITIVE value.???See Line 15 of the "hubers" function: ? ? > if (missing(s)) ? ? > s0 <- mad(y) ? ? > I propose setting "s0" to "mad(y)" or a small positive number, whichever is ? ? > larger.? For example: ? ? > if (missing(s)) ? ? > s0 <- max(mad(y), tol) ? ? > where tol=1e-6. but 'tol' is completely arbitrary and, the way you propose it makes the resulting estimate no-longer-scale-equivariant. huberM() *has* an? s? argument for specifying the scale estimate, so you could use it as ? huberM(a, s = max(mad(a), 1e-6))? if you want. Note that your sample 'a' is constructed in a way that all scale-equivariant 50%-breakpoint robust estimates of scale will return s = 0, as more than half of your observations are identical, and scale equivariance "ensures" that in this limiting case, indeed all other observations are "outliers". This last point is a somewhat interesting topic for "robustniks", and hence I'm CC'ing the dedicated "R + Robustness" mailing list, R-SIG-robust. Martin Maechler, ETH Zurich ? ? >? With this change,? the result is more sensible: ? ? >> hubers(a) ? ? > $mu ? ? > [1] 5.88 ? ? > $s ? ? > [1] 2.937 ? ? > Could anyone take a look at this and decide if the above modifications to the ? ? > "hubers" function are warranted?Thanks a lot! ? ? > Sincerely, ? ? > Feiming Chen ? ? > Read more >>???Options >>??? ? ? > [[alternative HTML version deleted]] ? ? > ______________________________________________ ? ? > R-help at r-project.org mailing list ? ? > https://stat.ethz.ch/mailman/listinfo/r-help ? ? > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html ? ? > and provide commented, minimal, self-contained, reproducible code. _______________________________________________ R-SIG-Robust at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust ############################################## This e-mail (and attachments) is confidential and may be legally privileged. ##############################################