[RsR] Questions about interpreting lmRob output

Here are some quick answers to your questions.  Please let me know if  
you would like more details.

Kjell

Dear Robust R gurus,
There are a variety of functions I can use to perform robust  
regression
in R and I am trying to figure out what would be the best package in R
to recommend to do robust regression. From reading through the  
archives
and looking at Maronna, Martin and Yohai?s book, my sense is that most
people on this list would recommend the robust library. That?s the  
one I
like the best too, because its output parallels the lm functions.
However, I have a few questions.

First, several books that discuss R and have mentioned robust methods
(Crawley, 2007; Faraway, 2005; Fox, 2002; Jureckova & Picek, 2006)  
have
used the rlm function in the MASS library. Do you have any idea why
these books would recommend rlm over lmRob? I don?t find the output to
be as helpful as lmRob?s output. Well, this is just curiosity.
The Robust Library was originally developed for S-PLUS and has only  
recently (sometime in the last 1-2 years) been made available for R.
Second, I have some questions about understanding the output of lmRob:
1) How can I interpret the test for bias in the summary output?
Test for Bias:
                  statistic   p-value
M-estimate  0.5652855 0.9997877
LS-estimate 3.6125604 0.8902805

Does the fact that the p-values are above .05 mean there was no bias  
in
my original data set?
The test for bias is a diagnostic for the fitting procedure and is not  
directly applicable to the fitted model.  It is computed by the  
function test.lmRob - if you're curious take a look at the reference  
in the help file.
2) I was excited to see that the robust library had functions like
update.lmRob and step.lmRob which again paralleled what I was doing  
with
the lm function. However, when I tried to use update.lmRob, here is  
the
error message I got (data set reproduced below):
m1.robust=lmRob(G1L1WR~ PAL2*KL2WR*NS, data=lafrance.na)
update.lmRob(m1.robust,~.-PAL2:KL2WR:NS)
Error in NextMethod() : generic function not specified
S-PLUS and R handle generic functions slightly differently.  I thought  
I had this working but it looks like I missed something.  Sorry.
And whenever I use step.lmRob, I get back exactly the same model  
that I
put into it, so that doesn?t seem to be working either!

step.lmRob(m1.robust)
Start:  RFPE= 25.0857
G1L1WR ~ PAL2 * KL2WR * NS

Warning in lmRob.fit.compute(x2, y, x1 = x1, x1.idx = x1.idx, nrep =
nrep,  :
 Max iteration for final coefficients reached.

Single term deletions

Model:
G1L1WR ~ PAL2 * KL2WR * NS

scale:  0.2330676

             Df   RFPE
<none>           25.086
PAL2:KL2WR:NS  1 26.658
Call:
lmRob(formula = G1L1WR ~ PAL2 * KL2WR * NS, data = lafrance.na)

Coefficients:
 (Intercept)          PAL2         KL2WR            NS    PAL2:KL2WR
  PAL2:NS
   -4.055888      4.205368     -8.077648      1.483424      4.342931
-1.264462
    KL2WR:NS PAL2:KL2WR:NS
    4.545588     -2.474043

Degrees of freedom: 37 total; 29 residual
Residual standard error: 0.2330676
For some reason lmRob is having trouble fitting the model G1L1WR ~  
PAL2 * KL2WR * NS - PAL2:KL2WR:NS.  This usually happens because more  
than half of the data fits the model perfectly (although I'm not sure  
that's what's happening in this case).  When the fitting fails  
step.lmRob and drop1.lmRob return the most recently selected model.   
Since this example fails in the first iteration it just returns the  
input model.
3) I had better luck with the plots.lmRob( ) function but I was
wondering what plots 9 and 10 were. What does it mean to have  
?overlaid?
plots? I could see an overlaid Q-Q plot of residuals was different  
than
the Normal Q-Q plot of residuals, but I don?t understand what the
difference is.
The plots are designed to compare robust and least squares fits of the  
same model.  Try this:

 > m1.comp <- fit.models(list(Robust = "lmRob", "Least Squares" =  
"lm"), formula = G1L1WR ~ PAL2*KL2WR*NS, data = lafrance.na)
 > plot(m1.comp)

then choices 9 and 10 make more sense.
4) As I understand the documentation, the lmRob function will use
M-estimators or S-estimators and will search for the best one for the
data. Reading over Wilcox (2005), he notes that no one robust method
will work best all of the time (as do Maronna, Martin and Yohai),  
but he
seems to like the Theil-Sen estimator, M-estimators, the MGV  
estimator,
the STS estimator, least trimmed squares the least absolute value
estimator, and the deepest regression line method. Not being well- 
versed
in the theory, I am going to venture the guess that the estimators  
that
lmRob is using would cover the M-estimators and STS estimator, but  
none
of the others. Does anyone have a comment on why only M-estimators and
S-estimators are used in lmRob? Along the same lines, Wilcox (2005)
recommends running a robust regression with an estimator with a high
breakdown point and a lower (.2-.3) breakdown point and comparing
estimates and CIs. From what I can see, M-estimators have a high
breakdown point (.5). I don?t know about S-estimators, but does anyone
else see value in what Wilcox recommends, and have an idea for some
other parameter I should use with lmRob to run a robust regression  
using
an estimator with a lower breakdown point?6) OK, and last of all, a  
dumb question. Is this dumb? Does lmRob
bootstrap the data? From the description of sampling it seems like it
does, but I am not sure exactly from the terminology.
The goal of the robust library is ease of use so, by default, it tries  
to abstract the specific choice of estimator as much as possible.   
lmRob tries to choose a combination of estimators that is sensible  
(although probably not the best) for the given model.  You may also  
want to look at the R package robustbase which takes a more technical  
approach to robust fitting.
sessionInfo()
R version 2.6.0 (2007-10-03)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets   
methods
base
other attached packages:
[1] robustbase_0.2-8 Rcmdr_1.3-5      car_1.2-5        robust_0.3-0
lattice_0.17-2
[6] MASS_7.2-36

loaded via a namespace (and not attached):
[1] grid_2.6.0  tools_2.6.0

lafrance.na
  NR WM       NS     PAL2      PAL1    G1L1WR     KL2WR    G1L2WR
KL1WR
1  47  2 2.406489 1.653213 1.5910646 1.7634280 1.1139434 1.7075702
1.4913617
2  45  1 2.568917 1.041393 1.2552725 0.7781513 0.0000000 0.0000000
0.3010300
4  48  2 2.177161 1.491362 1.3979400 1.5051500 0.0000000 0.3010300
0.0000000
5  46  4 2.010342 1.633468 1.3979400 1.5051500 0.0000000 0.3010300
0.7781513
6  52  4 2.201861 1.591065 1.5314789 1.6901961 0.9500000 1.1461280
0.9030900
7  59  2 2.247556 1.447158 1.1461280 0.4771213 0.0000000 0.0000000
0.0000000
8  44  2 2.319169 1.230449 0.7781513 0.6989700 0.0000000 0.0000000
0.0000000
9  31  2 2.443310 1.176091 1.4471580 0.8450980 0.0000000 0.0000000
0.0000000
10 56  3 2.119850 1.623249 1.6020600 1.7160033 1.2041200 1.5314789
1.2787536
11 45  0 2.451495 1.414973 1.0791812 1.1461280 0.0000000 0.0000000
0.3010300
12 43  4 2.239650 1.544068 1.4771213 1.4149733 0.9542425 1.1139434
1.1139434
13 42  4 2.279325 1.176091 0.0000000 0.4771213 0.0000000 0.0000000
0.3010300
14 42  4 2.123067 1.414973 1.1139434 1.7853298 0.0000000 1.3802112
0.6989700
15 44  3 2.377215 1.230449 0.9542425 1.6901961 0.0000000 0.7781513
0.0000000
16 46  4 2.161937 1.633468 1.6334685 1.6720979 1.2041200 1.2041200
1.4471580
17 52  2 2.316159 1.643453 1.5910646 1.7481880 1.0000000 1.1139434
1.3010300
19 56  2 2.200440 1.518514 1.2787536 1.4471580 0.6989700 0.6989700
0.6989700
21 46  1 2.404166 1.113943 1.1760913 1.1139434 0.0000000 0.9030900
0.0000000
22 49  0 2.446211 1.041393 0.7781513 0.0000000 0.0000000 0.0000000
0.0000000
23 44  0 2.264180 1.204120 0.9030900 0.9030900 0.0000000 0.0000000
0.0000000
24 48  3 2.354953 1.531479 1.3424227 1.2552725 0.0000000 0.0000000
0.0000000
25 54  4 2.110118 1.698970 1.5563025 1.5910646 1.4313638 1.0791812
1.0000000
26 62  3 2.292012 1.462398 0.9542425 1.7403627 0.6020600 0.9030900
0.9030900
27 52  3 2.163728 1.633468 1.3802112 1.5185139 0.0000000 0.0000000
0.0000000
28 55  3 2.209944 1.591065 1.4313638 1.5185139 1.0413927 0.7781513
0.0000000
29 44  3 2.258014 1.681241 1.6812412 1.6989700 1.2787536 1.2787536
1.3617278
30 50  3 2.163638 1.755875 1.6232493 1.5185139 1.3617278 1.3010300
1.0413927
31 58  3 2.154059 1.518514 1.4149733 1.4149733 0.0000000 0.3010300
0.4771213
32 36  3 2.317353 1.447158 1.1760913 0.7781513 0.4771213 0.7781513
0.6020600
33 43  2 2.147151 1.568202 1.4149733 1.5314789 0.0000000 0.0000000
0.6020600
34 40  2 2.273395 1.477121 1.4623980 1.4149733 0.3010300 0.7781513
0.4771213
35 55  3 2.106837 1.518514 1.5440680 1.6901961 1.0000000 1.5440680
0.6989700
36 63  4 2.070555 1.799341 1.3979400 1.7923917 0.7781513 1.6434527
0.8450980
37 49  3 2.029506 1.612784 1.6627578 1.8195439 1.3617278 1.8864907
1.1139434
38 61  6 2.143296 1.832509 1.7853298 1.8260748 2.0253059 1.9822712
1.7403627
39 58  2 2.217036 1.653213 1.5314789 1.3979400 0.9542425 0.9030900
0.9030900
40 46  2 2.160889 1.623249 1.5910646 1.6901961 0.4771213 1.3222193
0.0000000

Dr. Jenifer Larson-Hall
Assistant Professor of Linguistics
University of North Texas
(940)369-8950

_______________________________________________
R-SIG-Robust at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-robust

[RsR] Questions about interpreting lmRob output

Thread (2 messages)