Skip to content

[RsR] Prediction Intervals for Robust Regression

5 messages · Jonathan Burns, Stromberg, Arnold, Martin Maechler +1 more

#
I have created robust regression models using least trimmed squares and MM-
regression (using robustbase).

I am now looking to create prediction intervals for the predicted 
results.  While I have seen some discussion in the literature about 
confidence intervals on the estimates for robust regression, I haven?t had 
much success on prediction intervals for the results.  I was wondering 
anyone would be able to provide some direction on how to create these 
prediction intervals in the robust regression setting.

Thanks,
Jonathan Burns
Sr. Statistician 
General Dynamics Information Technology
Medicare & Medicaid Solutions
One West Pennsylvania Avenue
Baltimore, MD 21204
Jonathan.Burns1 at gdit.com
5 days later
#
Jonathan,
Seems straightforward theoretically, let's see if anyone has implemented them in R.

Arny


Arnold J. Stromberg
Professor and Chair
Department of Statistics
University of Kentucky
313 Multidisciplinary Science Building
725 Rose Street
Lexington, KY 40536-0082
Phone: 859-257-6115
Fax: 859-323-1973

-----Original Message-----
From: R-SIG-Robust [mailto:r-sig-robust-bounces at r-project.org] On Behalf Of Jonathan Burns
Sent: Wednesday, February 11, 2015 12:42 PM
To: r-sig-robust at r-project.org
Subject: [RsR] Prediction Intervals for Robust Regression

I have created robust regression models using least trimmed squares and MM- regression (using robustbase).

I am now looking to create prediction intervals for the predicted results.  While I have seen some discussion in the literature about confidence intervals on the estimates for robust regression, I haven?t had much success on prediction intervals for the results.  I was wondering anyone would be able to provide some direction on how to create these prediction intervals in the robust regression setting.

Thanks,
Jonathan Burns
Sr. Statistician
General Dynamics Information Technology
Medicare & Medicaid Solutions
One West Pennsylvania Avenue
Baltimore, MD 21204
Jonathan.Burns1 at gdit.com
_______________________________________________
R-SIG-Robust at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-robust
3 days later
#
> Jonathan,
    > Seems straightforward theoretically, let's see if anyone has implemented them in R.

    > Arny

    > Arnold J. Stromberg
    > Professor and Chair, Department of Statistics, University of Kentucky
    [ ........... ]

Well,  the predict() method for lmrob() fits (package
'robustbase') has built in.
I wonder why nobody has seen that and mentioned it here.

In the mean time, Jonathan has also asked on R-help
and got some advice there...
and now has found predict.lmrob  "in some way" and asked me (as
'robustbase' maintainer) about it.

I'm taking the liberty of answering here -- so others are also
helped in the future, *and* this thread is somewhat decently
closed within the R-SIG-robust list :
[..........]

    > I am interested in creating prediction intervals for the robust regression models.  I tried to use the function predict.lmrob(); however, R gave me an error - could not find function "predict.lmrob".  I thought perhaps this was because I was using an older version of the package.  I updated the package and I still get the error.  I am using R version 3.1.0.

    > I also got the same error with the function print.lmrob, plot.lmrob() and anova.lmrob().  Lmrob() itself works fine.

    > This is the result that I get when I list the functions in robustbase:

    >> ls("package:robustbase")
    > [1] "adjbox"              "adjboxStats"         "adjOutlyingness"
    > [4] "aircraft"            "airmay"              "alcohol"
    ......................
    ......................
    > [103] "vaso"                "wagnerGrowth"        "wgt.himedian"
    > [106] "wood"

print(), predict() etc are all generic functions,
their   lmrob S3 methods *are* called  print.lmrob(), predict.lmrob(), etc,
but they are *hidden* and you do not see them normally.

Rather you should use  print(..), predict(...), etc.

If you really need to see them you can use
  getAnywhere("predict.lmrob")
etc.

{This is all general R knowledge - somewhat intermediate level -
 about using S3 methods in R packages and namespaces}

Note that you've also asked about this on the R-help mailing list on
Feb 11, and you got two answers, the second one by Prof Brian
Ripley explained to you  that matters *are* actually more complicated:

If you use robustness for a good reason, it seems a bit
optimistic to assume that a future observation has normal errors,
(rather than a mixture of normal + "outlier") and so the
standard assumptions about prediction intervals would be doubtful.

But I agree (with you I assume) that sometimes you *want* to
make this somewhat optimistic assumption.... and for that case,
everything is ready for you on a silver plate :

Why did you not just read the help page for predict.lmrob ?
Even though the object is hidden -- because you should call predict() --
it still has nice help page {well, that can be improved, and I
will for the next version of robustbase}, and that help
*does* answer your question on how to compute prediction intervals: 

Andreas Ruckstuhl, the author of the function does provide them
(under the optimistic assumption), in the exact same way as the
predict() method for lm  {called "predict.lm"} does.

==> Just use  predict( <fitted lmrob object>,  interval = "prediction")
(or a variant where you specify new data, weights, etc).


Best regards,
Martin


    > -----Original Message-----
    > From: R-SIG-Robust [mailto:r-sig-robust-bounces at r-project.org] On Behalf Of Jonathan Burns
    > Sent: Wednesday, February 11, 2015 12:42 PM
    > To: r-sig-robust at r-project.org
    > Subject: [RsR] Prediction Intervals for Robust Regression

    > I have created robust regression models using least trimmed squares and MM- regression (using robustbase).

    > I am now looking to create prediction intervals for the predicted results.  While I have seen some discussion in the literature about confidence intervals on the estimates for robust regression, I haven?t had much success on prediction intervals for the results.  I was wondering anyone would be able to provide some direction on how to create these prediction intervals in the robust regression setting.

    > Thanks,
    > Jonathan Burns
    > Sr. Statistician
    > General Dynamics Information Technology
    > Medicare & Medicaid Solutions
    > One West Pennsylvania Avenue
    > Baltimore, MD 21204
    > Jonathan.Burns1 at gdit.com
    > _______________________________________________
    > R-SIG-Robust at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
    > _______________________________________________
    > R-SIG-Robust at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
1 day later
#
Hello Jonathan

Even though it is straightforward, there is a twist to it: Robust methods are for data with outliers or -- a more sophisticated view -- long tailed distributions. The normal quantiles might be used to encompass "normal" observations -- but the probability of having a real observation in the interval woud be overestimated.
An alternative may be to use an empirical quantile of the standardized residuals with scale equal to the standard deviation obtained from the formula for normal observations. 
I wonder whether this argument has been formally written down in the literature.

Good success!

Werner Stahel
M +41 79 784 9330 | P +41 44 364 6424
1 day later
#
Seems to me that the user would have to decide on the issue of "real" vs "normal" observations for the prediction interval.

Arnold J. Stromberg
Professor and Chair
Department of Statistics
University of Kentucky
313 Multidisciplinary Science Building
725 Rose Street
Lexington, KY 40536-0082
Phone: 859-257-6115
Fax: 859-323-1973

-----Original Message-----
From: Stahel Werner A. [mailto:stahel at stat.math.ethz.ch] 
Sent: Saturday, February 21, 2015 11:51 AM
To: M?chler Martin; Jonathan Burns; Stromberg, Arnold
Cc: mailman, r-sig-robust
Subject: AW: [RsR] Prediction Intervals for Robust Regression

Hello Jonathan

Even though it is straightforward, there is a twist to it: Robust methods are for data with outliers or -- a more sophisticated view -- long tailed distributions. The normal quantiles might be used to encompass "normal" observations -- but the probability of having a real observation in the interval woud be overestimated.
An alternative may be to use an empirical quantile of the standardized residuals with scale equal to the standard deviation obtained from the formula for normal observations. 
I wonder whether this argument has been formally written down in the literature.

Good success!

Werner Stahel
M +41 79 784 9330 | P +41 44 364 6424