Skip to content

Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping

12 messages · Bert Gunter, David Winsemius, Jan Stanstrup

#
Hi,

I am trying to find a way to estimate prediction intervals (PI) for a 
monotonic loess curve using bootstrapping.

At the moment my approach is to use the boot function from the boot 
package to bootstrap my loess model, which consist of loess + monoproc 
from the monoproc package (to force the fit to be monotonic which gives 
me much improved results with my particular data). The output from the 
monoproc package is simply the fitted y values at each x-value.
I then use boot.ci (again from the boot package) to get confidence 
intervals. The problem is that this gives me confidence intervals (CI) 
for the "fit" (is there a proper way to specify this?) and not a 
prediction interval. The interval is thus way too optimistic to give me 
an idea of the confidence interval of a predicted value.

For linear models predict.lm can give PI instead of CI by setting 
interval = "prediction". Further discussion of that here:
http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.

However I don't see a way to do that for boot.ci. Does there exist a way 
to get PIs after bootstrapping? If some sample code is required I am 
more than happy to supply it but I thought the question was general 
enough to be understandable without it.


Any hints are highly appreciated.


----------------------
Jan Stanstrup
Postdoc

Metabolomics
Food Quality and Nutrition
Fondazione Edmund Mach
#
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:

            
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there.
#
PI's of what? -- future individual values or mean values?

I assume quantreg provides quantiles for the latter, not the former.
(See ?predict.lm for a terse explanation of the difference). Both are
obtainable from bootstrapping but the details depend on what you are
prepared to assume. Consult references or your local statistician for
help if needed.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll
On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
#
Thanks to all of you for your suggestions and comments. I really 
appreciate it.

Some comments to Dennis' comments:
1) I am not concerned about predicting outside the original range. That 
would be nonsense anyway considering the physical phenomenon I am 
modeling. I am, however, concerned that the bootstrapping leads to 
extremely wide CIs at the extremes of the range when there are few data 
points. But I guess there is not much I can do about that as long as I 
rely on bootstrapping?

2) I have made a function that does the interpolation to the requested 
new x's from the original modeling data to get the residual variance and 
the model variance. Then it interpolates the combined SDs back the the 
new x values. See below.

3) I understand that. For this project it is not that important that the 
final prediction intervals are super accurate. But I need to hit the 
ballpark. I am only trying to do something that doesn't crossly 
underestimate the prediction error and doesn't make statisticians loose 
their lunch a first glance.
I also cannot avoid that my data contains erroneous values and I will 
need to build many models unsupervised. But the fit should be good 
enough that I plan to eliminate values outside some multiple of the 
prediction interval and then re-calculate. And if the model is not good 
in any range I will throw it out completely.


Based on the formula of my last message I have made a function that at 
least gives less optimistic intervals than what I could get with the 
other methods I have tried. The function and example data can be found 
here 
https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R 
in case anymore has any comments, suggestions or expletives to my 
implementation.


----------------------
Jan Stanstrup
Postdoc

Metabolomics
Food Quality and Nutrition
Fondazione Edmund Mach
On 08/12/2014 05:40 PM, Bert Gunter wrote:
#
On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:

            
I probably should have questioned the poster about what was meant by a "prediction interval for a monotonic loess curve". I was suggesting quantile regression for estimation of a chosen quantile, say the 90th percentile. I was thinking it could produce the analogue of a 90th percentile value (with no reference to a mean value or use of presumed distribution within adjacent windows of say 100-150 points. I had experience using the cobs function (in the package of the same name) as Koenker illustrates:

age <- runif(1000,min=60,max=85)
 
 analyte <- rlnorm(1000,4*(age/60),age/60)
 plot(age,analyte)

 library(cobs)
 library(quantreg)
 Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9) 
Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)

png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
 lines(predict(Rbs.9), col = "red", lwd = 1.5)
lines(predict(Rbs.median), col = "blue", lwd = 1.5)
 dev.off()


-- David
David Winsemius
Alameda, CA, USA
#
Thank you very much for this snippet!

I used it on my data and indeed it does give intervals which appear 
quite realistic (script and data here 
https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R).
I also tried getting the intervals with predict.cobs but the methods 
available there gave very narrow bands.
The only problem I can see is that the fit tend to be a bit on the 
smooth side. See for example the upper interval limits at x = 2 to 3 and 
x =1.2. If then I set lambda to something low like 0.05 the band narrows 
to nearly nothing when there are few points. For example at x = 2.5. Is 
there some other parameter I would be adjusting?



----------------------
Jan Stanstrup
Postdoc

Metabolomics
Food Quality and Nutrition
Fondazione Edmund Mach
On 08/14/2014 02:02 AM, David Winsemius wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: boot2ci_PI.png
Type: image/png
Size: 35198 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140814/a61c0f40/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cobs.png
Type: image/png
Size: 32916 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140814/a61c0f40/attachment-0001.png>
#
On Aug 14, 2014, at 7:17 AM, Jan Stanstrup wrote:

            
Try specifying the number and location of the knots (using my example data):

  
    
#
On Aug 14, 2014, at 9:06 AM, David Winsemius wrote:

            
My png file seems to have gotten stripped (try again):




(Not responding directly to Jan Stanstrup because his email address seems to force holdup in the moderation queue.)
3 days later
#
The knots are deleted anyway ("Deleting unnecessary knots ..."). It 
seems to make no difference.
On 08/14/2014 06:06 PM, David Winsemius wrote:
#
I had that result sometimes when testing as well. You don't offer any code so there's nothing I can do to follow-up.
#
Sorry. I have updated the code to have include the knot selection 
(https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). 
I am working on the "Good data" at the moment.


- Jan.
On 08/18/2014 08:14 PM, David Winsemius wrote:

  
  
#
And just then I realized the problem. nknots need to be length(knots). 
Otherwise knots are deleted.

I am not so sure this works equally well as my original loess fit 
though. The fit I get with cobs is highly dependent on the "knot step 
size". At 0.4 for example it seems ok. At 0.3 I get points with 
basically zero width intervals. At 0.2 it breaks ("There is at least one 
pair of adjacent knots that contains no observation."). Not so great for 
my use case where I need to build many models unsupervised.



- Jan.
On 08/19/2014 09:44 AM, Jan Stanstrup wrote: