[RsR] Questions about interpreting lmRob output
Here are some quick answers to your questions. Please let me know if you would like more details. Kjell
On 13 Nov 2007, at 20:59, Jenifer Larson-Hall wrote:
Dear Robust R gurus, There are a variety of functions I can use to perform robust regression in R and I am trying to figure out what would be the best package in R to recommend to do robust regression. From reading through the archives and looking at Maronna, Martin and Yohai?s book, my sense is that most people on this list would recommend the robust library. That?s the one I like the best too, because its output parallels the lm functions. However, I have a few questions. First, several books that discuss R and have mentioned robust methods (Crawley, 2007; Faraway, 2005; Fox, 2002; Jureckova & Picek, 2006) have used the rlm function in the MASS library. Do you have any idea why these books would recommend rlm over lmRob? I don?t find the output to be as helpful as lmRob?s output. Well, this is just curiosity.
The Robust Library was originally developed for S-PLUS and has only recently (sometime in the last 1-2 years) been made available for R.
Second, I have some questions about understanding the output of lmRob:
1) How can I interpret the test for bias in the summary output?
Test for Bias:
statistic p-value
M-estimate 0.5652855 0.9997877
LS-estimate 3.6125604 0.8902805
Does the fact that the p-values are above .05 mean there was no bias
in
my original data set?
The test for bias is a diagnostic for the fitting procedure and is not directly applicable to the fitted model. It is computed by the function test.lmRob - if you're curious take a look at the reference in the help file.
2) I was excited to see that the robust library had functions like update.lmRob and step.lmRob which again paralleled what I was doing with the lm function. However, when I tried to use update.lmRob, here is the error message I got (data set reproduced below): m1.robust=lmRob(G1L1WR~ PAL2*KL2WR*NS, data=lafrance.na) update.lmRob(m1.robust,~.-PAL2:KL2WR:NS) Error in NextMethod() : generic function not specified
S-PLUS and R handle generic functions slightly differently. I thought I had this working but it looks like I missed something. Sorry.
And whenever I use step.lmRob, I get back exactly the same model that I put into it, so that doesn?t seem to be working either!
step.lmRob(m1.robust)
Start: RFPE= 25.0857
G1L1WR ~ PAL2 * KL2WR * NS
Warning in lmRob.fit.compute(x2, y, x1 = x1, x1.idx = x1.idx, nrep =
nrep, :
Max iteration for final coefficients reached.
Single term deletions
Model:
G1L1WR ~ PAL2 * KL2WR * NS
scale: 0.2330676
Df RFPE
<none> 25.086
PAL2:KL2WR:NS 1 26.658
Call:
lmRob(formula = G1L1WR ~ PAL2 * KL2WR * NS, data = lafrance.na)
Coefficients:
(Intercept) PAL2 KL2WR NS PAL2:KL2WR
PAL2:NS
-4.055888 4.205368 -8.077648 1.483424 4.342931
-1.264462
KL2WR:NS PAL2:KL2WR:NS
4.545588 -2.474043
Degrees of freedom: 37 total; 29 residual
Residual standard error: 0.2330676
For some reason lmRob is having trouble fitting the model G1L1WR ~ PAL2 * KL2WR * NS - PAL2:KL2WR:NS. This usually happens because more than half of the data fits the model perfectly (although I'm not sure that's what's happening in this case). When the fitting fails step.lmRob and drop1.lmRob return the most recently selected model. Since this example fails in the first iteration it just returns the input model.
3) I had better luck with the plots.lmRob( ) function but I was wondering what plots 9 and 10 were. What does it mean to have ?overlaid? plots? I could see an overlaid Q-Q plot of residuals was different than the Normal Q-Q plot of residuals, but I don?t understand what the difference is.
The plots are designed to compare robust and least squares fits of the same model. Try this: > m1.comp <- fit.models(list(Robust = "lmRob", "Least Squares" = "lm"), formula = G1L1WR ~ PAL2*KL2WR*NS, data = lafrance.na) > plot(m1.comp) then choices 9 and 10 make more sense.
4) As I understand the documentation, the lmRob function will use M-estimators or S-estimators and will search for the best one for the data. Reading over Wilcox (2005), he notes that no one robust method will work best all of the time (as do Maronna, Martin and Yohai), but he seems to like the Theil-Sen estimator, M-estimators, the MGV estimator, the STS estimator, least trimmed squares the least absolute value estimator, and the deepest regression line method. Not being well- versed in the theory, I am going to venture the guess that the estimators that lmRob is using would cover the M-estimators and STS estimator, but none of the others. Does anyone have a comment on why only M-estimators and S-estimators are used in lmRob? Along the same lines, Wilcox (2005) recommends running a robust regression with an estimator with a high breakdown point and a lower (.2-.3) breakdown point and comparing estimates and CIs. From what I can see, M-estimators have a high breakdown point (.5). I don?t know about S-estimators, but does anyone else see value in what Wilcox recommends, and have an idea for some other parameter I should use with lmRob to run a robust regression using an estimator with a lower breakdown point?6) OK, and last of all, a dumb question. Is this dumb? Does lmRob bootstrap the data? From the description of sampling it seems like it does, but I am not sure exactly from the terminology.
The goal of the robust library is ease of use so, by default, it tries to abstract the specific choice of estimator as much as possible. lmRob tries to choose a combination of estimators that is sensible (although probably not the best) for the given model. You may also want to look at the R package robustbase which takes a more technical approach to robust fitting.
sessionInfo() R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] robustbase_0.2-8 Rcmdr_1.3-5 car_1.2-5 robust_0.3-0 lattice_0.17-2 [6] MASS_7.2-36 loaded via a namespace (and not attached): [1] grid_2.6.0 tools_2.6.0
lafrance.na
NR WM NS PAL2 PAL1 G1L1WR KL2WR G1L2WR KL1WR 1 47 2 2.406489 1.653213 1.5910646 1.7634280 1.1139434 1.7075702 1.4913617 2 45 1 2.568917 1.041393 1.2552725 0.7781513 0.0000000 0.0000000 0.3010300 4 48 2 2.177161 1.491362 1.3979400 1.5051500 0.0000000 0.3010300 0.0000000 5 46 4 2.010342 1.633468 1.3979400 1.5051500 0.0000000 0.3010300 0.7781513 6 52 4 2.201861 1.591065 1.5314789 1.6901961 0.9500000 1.1461280 0.9030900 7 59 2 2.247556 1.447158 1.1461280 0.4771213 0.0000000 0.0000000 0.0000000 8 44 2 2.319169 1.230449 0.7781513 0.6989700 0.0000000 0.0000000 0.0000000 9 31 2 2.443310 1.176091 1.4471580 0.8450980 0.0000000 0.0000000 0.0000000 10 56 3 2.119850 1.623249 1.6020600 1.7160033 1.2041200 1.5314789 1.2787536 11 45 0 2.451495 1.414973 1.0791812 1.1461280 0.0000000 0.0000000 0.3010300 12 43 4 2.239650 1.544068 1.4771213 1.4149733 0.9542425 1.1139434 1.1139434 13 42 4 2.279325 1.176091 0.0000000 0.4771213 0.0000000 0.0000000 0.3010300 14 42 4 2.123067 1.414973 1.1139434 1.7853298 0.0000000 1.3802112 0.6989700 15 44 3 2.377215 1.230449 0.9542425 1.6901961 0.0000000 0.7781513 0.0000000 16 46 4 2.161937 1.633468 1.6334685 1.6720979 1.2041200 1.2041200 1.4471580 17 52 2 2.316159 1.643453 1.5910646 1.7481880 1.0000000 1.1139434 1.3010300 19 56 2 2.200440 1.518514 1.2787536 1.4471580 0.6989700 0.6989700 0.6989700 21 46 1 2.404166 1.113943 1.1760913 1.1139434 0.0000000 0.9030900 0.0000000 22 49 0 2.446211 1.041393 0.7781513 0.0000000 0.0000000 0.0000000 0.0000000 23 44 0 2.264180 1.204120 0.9030900 0.9030900 0.0000000 0.0000000 0.0000000 24 48 3 2.354953 1.531479 1.3424227 1.2552725 0.0000000 0.0000000 0.0000000 25 54 4 2.110118 1.698970 1.5563025 1.5910646 1.4313638 1.0791812 1.0000000 26 62 3 2.292012 1.462398 0.9542425 1.7403627 0.6020600 0.9030900 0.9030900 27 52 3 2.163728 1.633468 1.3802112 1.5185139 0.0000000 0.0000000 0.0000000 28 55 3 2.209944 1.591065 1.4313638 1.5185139 1.0413927 0.7781513 0.0000000 29 44 3 2.258014 1.681241 1.6812412 1.6989700 1.2787536 1.2787536 1.3617278 30 50 3 2.163638 1.755875 1.6232493 1.5185139 1.3617278 1.3010300 1.0413927 31 58 3 2.154059 1.518514 1.4149733 1.4149733 0.0000000 0.3010300 0.4771213 32 36 3 2.317353 1.447158 1.1760913 0.7781513 0.4771213 0.7781513 0.6020600 33 43 2 2.147151 1.568202 1.4149733 1.5314789 0.0000000 0.0000000 0.6020600 34 40 2 2.273395 1.477121 1.4623980 1.4149733 0.3010300 0.7781513 0.4771213 35 55 3 2.106837 1.518514 1.5440680 1.6901961 1.0000000 1.5440680 0.6989700 36 63 4 2.070555 1.799341 1.3979400 1.7923917 0.7781513 1.6434527 0.8450980 37 49 3 2.029506 1.612784 1.6627578 1.8195439 1.3617278 1.8864907 1.1139434 38 61 6 2.143296 1.832509 1.7853298 1.8260748 2.0253059 1.9822712 1.7403627 39 58 2 2.217036 1.653213 1.5314789 1.3979400 0.9542425 0.9030900 0.9030900 40 46 2 2.160889 1.623249 1.5910646 1.6901961 0.4771213 1.3222193 0.0000000 Dr. Jenifer Larson-Hall Assistant Professor of Linguistics University of North Texas (940)369-8950
_______________________________________________ R-SIG-Robust at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust