Hi Listers,
I have been running a robust 2nd order polynomial regression and I want to
extract the weights information to see what datapoints are behaving as
outliers. Although I can see the summary data for the weights as part of
the output, I have not been able to find any help for extracting all the
individual weights. I would also like to be able to colour these points on
a scatterplot if possible.
I also cannot find any R packages that provide a way to calculate and
remove outliers in nonlinear regressions - along the lines of the paper by
Motulsky and Brown 2006.
Any advice on this would be most welcome.
Thanks for help.
Andy
Andrew Halford Ph.D
Adjunct Research Scientist
University of Guam & Curtin University
Ph: +61 (0) 468 419 473
[[alternative HTML version deleted]]
Dear Andy
I would recommend to use lmrob for running polynomial regression. It can
handle outliers on leverage points much better than nlrob.
library(robustbase)
set.seed(54362)
DAT <- data.frame(x=c(seq(-5,5,by=1), 8))
DAT$xq <- DAT$x^2
DAT$y <- 2 + 3*DAT$x + 1.2*DAT$xq + rnorm(nrow(DAT), sd=1.5)
DAT$y[nrow(DAT)] <- 60
DAT.lm <- lm(y ~ x + xq, data=DAT)
summary(DAT.lm)
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9686 2.4849 2.402 0.039770 *
## x 2.1114 0.5750 3.672 0.005141 **
## xq 0.7117 0.1257 5.663 0.000308 ***
## ---
## Residual standard error: 6.41 on 9 degrees of freedom
DAT.rlm <- lmrob(y ~ x + xq, data=DAT)
summary(DAT.rlm)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.48591 0.47660 3.118 0.0124 *
## x 2.72508 0.08822 30.890 1.91e-10 ***
## xq 1.24015 0.02458 50.457 2.37e-12 ***
## ---
## Robust residual standard error: 1.451
str(DAT.rlm)
(h.rw <- DAT.rlm$weights)
h.sel <- h.rw < 0.1
plot(y~x, data=DAT)
lines(DAT$x, predict(DAT.rlm), col="blue", lwd=2)
lines(DAT$x, predict(DAT.lm), col="red")
points(DAT$x[h.sel], DAT$y[h.sel], pch=16, col="orange")
## I don't know the paper by Motulsky and Brown 2006. I do not recommend
## removing outliers but just recommend identifying outliers and if
## possible correct typos and such things.
DAT.nlrob1 <- nlrob(y ~ a + b*x + c*x^2, start=list(a=6,b=2,c=0.7),
data=DAT)
summary(DAT.nlrob1)
## Formula: y ~ a + b * x + c * x^2
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 5.1419 2.0241 2.540 0.03169 *
## b 1.8283 0.4775 3.829 0.00403 **
## c 0.7177 0.1004 7.148 5.38e-05 ***
## ---
## Robust residual standard error: 4.521
## nlrob cannot cope wit outliers on leverage points!
## Hence
h.d <- (DAT$x-median( DAT$x))/mad(DAT$x)
DAT$wx <- 1/sqrt(1+8*pmax(0, h.d^2-1)/sqrt(2))
DAT.nlrob2 <- nlrob(y ~ a + b*x + c*x^2, start=list(a=6,b=2,c=0.7)
,data=DAT,
weights=DAT$wx)
summary(DAT.nlrob2)
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.81836 0.74151 2.452 0.0366 *
## b 2.69350 0.16763 16.068 6.20e-08 ***
## c 1.19773 0.05919 20.236 8.19e-09 ***
## ---
## Robust residual standard error: 1.417
str(DAT.nlrob2)
DAT.nlrob2$w.r
## [1] 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000
## [7] 1.00000000 0.88186039 1.00000000 1.00000000 1.00000000 0.04762281
## These are more reliable results.
## But be careful. This approach with weighting does not yield a
## high-breakdown estimator.
And by the way:
I guess you will be very careful with expressions like 'DAT$x^2' because
of numerical issues, won't you.
All the best,
Andreas
Am 20.11.2012 09:05, schrieb Andrew Halford:
Hi Listers,
I have been running a robust 2nd order polynomial regression and I want to
extract the weights information to see what datapoints are behaving as
outliers. Although I can see the summary data for the weights as part of
the output, I have not been able to find any help for extracting all the
individual weights. I would also like to be able to colour these points on
a scatterplot if possible.
I also cannot find any R packages that provide a way to calculate and
remove outliers in nonlinear regressions - along the lines of the paper by
Motulsky and Brown 2006.
Any advice on this would be most welcome.
Thanks for help.
Andy
----------------------------------------------------------------------
Prof. Dr. Andreas Ruckstuhl
ZHAW Z?rcher Hochschule f?r Angewandte Wissenschaften
IDP Institut f?r Datenanalyse und Prozessdesign
Rosenstrasse 3 Tel. : +41 (0)58 934 78 12
Postfach Fax : +41 (0)58 935 78 12
CH-8401 Winterthur e-Mail: Andreas.Ruckstuhl at zhaw.ch
WWW : http://www.idp.zhaw.ch