Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it. Any hints are highly appreciated. ---------------------- Jan Stanstrup Postdoc Metabolomics Food Quality and Nutrition Fondazione Edmund Mach
Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping
12 messages · Bert Gunter, David Winsemius, Jan Stanstrup
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there.
David Winsemius Alameda, CA, USA
PI's of what? -- future individual values or mean values? I assume quantreg provides quantiles for the latter, not the former. (See ?predict.lm for a terse explanation of the difference). Both are obtainable from bootstrapping but the details depend on what you are prepared to assume. Consult references or your local statistician for help if needed. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll
On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks to all of you for your suggestions and comments. I really appreciate it. Some comments to Dennis' comments: 1) I am not concerned about predicting outside the original range. That would be nonsense anyway considering the physical phenomenon I am modeling. I am, however, concerned that the bootstrapping leads to extremely wide CIs at the extremes of the range when there are few data points. But I guess there is not much I can do about that as long as I rely on bootstrapping? 2) I have made a function that does the interpolation to the requested new x's from the original modeling data to get the residual variance and the model variance. Then it interpolates the combined SDs back the the new x values. See below. 3) I understand that. For this project it is not that important that the final prediction intervals are super accurate. But I need to hit the ballpark. I am only trying to do something that doesn't crossly underestimate the prediction error and doesn't make statisticians loose their lunch a first glance. I also cannot avoid that my data contains erroneous values and I will need to build many models unsupervised. But the fit should be good enough that I plan to eliminate values outside some multiple of the prediction interval and then re-calculate. And if the model is not good in any range I will throw it out completely. Based on the formula of my last message I have made a function that at least gives less optimistic intervals than what I could get with the other methods I have tried. The function and example data can be found here https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R in case anymore has any comments, suggestions or expletives to my implementation. ---------------------- Jan Stanstrup Postdoc Metabolomics Food Quality and Nutrition Fondazione Edmund Mach
On 08/12/2014 05:40 PM, Bert Gunter wrote:
PI's of what? -- future individual values or mean values? I assume quantreg provides quantiles for the latter, not the former. (See ?predict.lm for a terse explanation of the difference). Both are obtainable from bootstrapping but the details depend on what you are prepared to assume. Consult references or your local statistician for help if needed. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:
PI's of what? -- future individual values or mean values? I assume quantreg provides quantiles for the latter, not the former. (See ?predict.lm for a terse explanation of the difference).
I probably should have questioned the poster about what was meant by a "prediction interval for a monotonic loess curve". I was suggesting quantile regression for estimation of a chosen quantile, say the 90th percentile. I was thinking it could produce the analogue of a 90th percentile value (with no reference to a mean value or use of presumed distribution within adjacent windows of say 100-150 points. I had experience using the cobs function (in the package of the same name) as Koenker illustrates:
age <- runif(1000,min=60,max=85)
analyte <- rlnorm(1000,4*(age/60),age/60)
plot(age,analyte)
library(cobs)
library(quantreg)
Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9)
Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)
png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
lines(predict(Rbs.9), col = "red", lwd = 1.5)
lines(predict(Rbs.median), col = "blue", lwd = 1.5)
dev.off()
-- David
obtainable from bootstrapping but the details depend on what you are prepared to assume. Consult references or your local statistician for help if needed. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
Thank you very much for this snippet! I used it on my data and indeed it does give intervals which appear quite realistic (script and data here https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). I also tried getting the intervals with predict.cobs but the methods available there gave very narrow bands. The only problem I can see is that the fit tend to be a bit on the smooth side. See for example the upper interval limits at x = 2 to 3 and x =1.2. If then I set lambda to something low like 0.05 the band narrows to nearly nothing when there are few points. For example at x = 2.5. Is there some other parameter I would be adjusting? ---------------------- Jan Stanstrup Postdoc Metabolomics Food Quality and Nutrition Fondazione Edmund Mach
On 08/14/2014 02:02 AM, David Winsemius wrote:
On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:
PI's of what? -- future individual values or mean values? I assume quantreg provides quantiles for the latter, not the former. (See ?predict.lm for a terse explanation of the difference).
I probably should have questioned the poster about what was meant by a
"prediction interval for a monotonic loess curve". I was suggesting
quantile regression for estimation of a chosen quantile, say the 90th
percentile. I was thinking it could produce the analogue of a 90th
percentile value (with no reference to a mean value or use of presumed
distribution within adjacent windows of say 100-150 points. I had
experience using the cobs function (in the package of the same name)
as Koenker illustrates:
age <- runif(1000,min=60,max=85)
analyte <- rlnorm(1000,4*(age/60),age/60)
plot(age,analyte)
library(cobs)
library(quantreg)
Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9)
Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)
png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
lines(predict(Rbs.9), col = "red", lwd = 1.5)
lines(predict(Rbs.median), col = "blue", lwd = 1.5)
dev.off()
-- David
obtainable from bootstrapping but the details depend on what you are prepared to assume. Consult references or your local statistician for help if needed. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net <mailto:dwinsemius at comcast.net>> wrote:
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
-------------- next part -------------- A non-text attachment was scrubbed... Name: boot2ci_PI.png Type: image/png Size: 35198 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140814/a61c0f40/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: cobs.png Type: image/png Size: 32916 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140814/a61c0f40/attachment-0001.png>
On Aug 14, 2014, at 7:17 AM, Jan Stanstrup wrote:
Thank you very much for this snippet! I used it on my data and indeed it does give intervals which appear quite realistic (script and data here https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). I also tried getting the intervals with predict.cobs but the methods available there gave very narrow bands. The only problem I can see is that the fit tend to be a bit on the smooth side. See for example the upper interval limits at x = 2 to 3 and x =1.2. If then I set lambda to something low like 0.05 the band narrows to nearly nothing when there are few points. For example at x = 2.5. Is there some other parameter I would be adjusting?
Try specifying the number and location of the knots (using my example data):
Rbs.9 <- cobs(age,analyte,constraint="increase",tau=0.9, nknots=6, knots=seq(60,85,by=5)) plot(age,analyte, ylim=c(0,2000)) lines(predict(Rbs.9), col = 2, lwd = 1.5)
David.
>
>
> ----------------------
> Jan Stanstrup
> Postdoc
>
> Metabolomics
> Food Quality and Nutrition
> Fondazione Edmund Mach
>
>
>
> On 08/14/2014 02:02 AM, David Winsemius wrote:
>>
>> On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:
>>
>>> PI's of what? -- future individual values or mean values?
>>>
>>> I assume quantreg provides quantiles for the latter, not the former.
>>> (See ?predict.lm for a terse explanation of the difference).
>>
>> I probably should have questioned the poster about what was meant by a "prediction interval for a monotonic loess curve". I was suggesting quantile regression for estimation of a chosen quantile, say the 90th percentile. I was thinking it could produce the analogue of a 90th percentile value (with no reference to a mean value or use of presumed distribution within adjacent windows of say 100-150 points. I had experience using the cobs function (in the package of the same name) as Koenker illustrates:
>>
>> age <- runif(1000,min=60,max=85)
>>
>> analyte <- rlnorm(1000,4*(age/60),age/60)
>> plot(age,analyte)
>>
>> library(cobs)
>> library(quantreg)
>> Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9)
>> Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)
>>
>> png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
>> lines(predict(Rbs.9), col = "red", lwd = 1.5)
>> lines(predict(Rbs.median), col = "blue", lwd = 1.5)
>> dev.off()
>> <Mail Attachment.png>
>>
>> -- David
>>
>>
>>> obtainable from bootstrapping but the details depend on what you are
>>> prepared to assume. Consult references or your local statistician for
>>> help if needed.
>>>
>>> -- Bert
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>> (650) 467-7374
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>> Clifford Stoll
>>>
>>>
>>>
>>>
>>> On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>
>>>> On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping.
>>>>>
>>>>> At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value.
>>>>> I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value.
>>>>>
>>>>> For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here:
>>>>> http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
>>>>> http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.
>>>>>
>>>>> However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
>>>>>
>>>>
>>>> Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there.
>>>> --
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>
> <boot2ci_PI.png><cobs.png>
David Winsemius
Alameda, CA, USA
On Aug 14, 2014, at 9:06 AM, David Winsemius wrote:
On Aug 14, 2014, at 7:17 AM, Jan Stanstrup wrote:
Thank you very much for this snippet! I used it on my data and indeed it does give intervals which appear quite realistic (script and data here https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). I also tried getting the intervals with predict.cobs but the methods available there gave very narrow bands. The only problem I can see is that the fit tend to be a bit on the smooth side. See for example the upper interval limits at x = 2 to 3 and x =1.2. If then I set lambda to something low like 0.05 the band narrows to nearly nothing when there are few points. For example at x = 2.5. Is there some other parameter I would be adjusting?
Try specifying the number and location of the knots (using my example data):
Rbs.9 <- cobs(age,analyte,constraint="increase",tau=0.9, nknots=6, knots=seq(60,85,by=5)) plot(age,analyte, ylim=c(0,2000)) lines(predict(Rbs.9), col = 2, lwd = 1.5)
My png file seems to have gotten stripped (try again): (Not responding directly to Jan Stanstrup because his email address seems to force holdup in the moderation queue.)
David.
>
>
>
> --
> David.
>
>>
>>
>> ----------------------
>> Jan Stanstrup
>> Postdoc
>>
>> Metabolomics
>> Food Quality and Nutrition
>> Fondazione Edmund Mach
>>
>>
>>
>> On 08/14/2014 02:02 AM, David Winsemius wrote:
>>>
>>> On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:
>>>
>>>> PI's of what? -- future individual values or mean values?
>>>>
>>>> I assume quantreg provides quantiles for the latter, not the former.
>>>> (See ?predict.lm for a terse explanation of the difference).
>>>
>>> I probably should have questioned the poster about what was meant by a "prediction interval for a monotonic loess curve". I was suggesting quantile regression for estimation of a chosen quantile, say the 90th percentile. I was thinking it could produce the analogue of a 90th percentile value (with no reference to a mean value or use of presumed distribution within adjacent windows of say 100-150 points. I had experience using the cobs function (in the package of the same name) as Koenker illustrates:
>>>
>>> age <- runif(1000,min=60,max=85)
>>>
>>> analyte <- rlnorm(1000,4*(age/60),age/60)
>>> plot(age,analyte)
>>>
>>> library(cobs)
>>> library(quantreg)
>>> Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9)
>>> Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)
>>>
>>> png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
>>> lines(predict(Rbs.9), col = "red", lwd = 1.5)
>>> lines(predict(Rbs.median), col = "blue", lwd = 1.5)
>>> dev.off()
>>> <Mail Attachment.png>
>>>
>>> -- David
>>>
>>>
>>>> obtainable from bootstrapping but the details depend on what you are
>>>> prepared to assume. Consult references or your local statistician for
>>>> help if needed.
>>>>
>>>> -- Bert
>>>>
>>>> Bert Gunter
>>>> Genentech Nonclinical Biostatistics
>>>> (650) 467-7374
>>>>
>>>> "Data is not information. Information is not knowledge. And knowledge
>>>> is certainly not wisdom."
>>>> Clifford Stoll
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>
>>>>> On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping.
>>>>>>
>>>>>> At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value.
>>>>>> I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value.
>>>>>>
>>>>>> For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here:
>>>>>> http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
>>>>>> http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.
>>>>>>
>>>>>> However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
>>>>>>
>>>>>
>>>>> Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there.
>>>>> --
>>>>>
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>
>> <boot2ci_PI.png><cobs.png>
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
3 days later
The knots are deleted anyway ("Deleting unnecessary knots ..."). It
seems to make no difference.
On 08/14/2014 06:06 PM, David Winsemius wrote:
On Aug 14, 2014, at 7:17 AM, Jan Stanstrup wrote:
Thank you very much for this snippet! I used it on my data and indeed it does give intervals which appear quite realistic (script and data here https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). I also tried getting the intervals with predict.cobs but the methods available there gave very narrow bands. The only problem I can see is that the fit tend to be a bit on the smooth side. See for example the upper interval limits at x = 2 to 3 and x =1.2. If then I set lambda to something low like 0.05 the band narrows to nearly nothing when there are few points. For example at x = 2.5. Is there some other parameter I would be adjusting?
Try specifying the number and location of the knots (using my example data):
Rbs.9 <- cobs(age,analyte,constraint="increase",tau=0.9, nknots=6,
knots=seq(60,85,by=5))
plot(age,analyte, ylim=c(0,2000)) lines(predict(Rbs.9), col = 2, lwd = 1.5)
-- David.
---------------------- Jan Stanstrup Postdoc Metabolomics Food Quality and Nutrition Fondazione Edmund Mach On 08/14/2014 02:02 AM, David Winsemius wrote:
On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:
PI's of what? -- future individual values or mean values? I assume quantreg provides quantiles for the latter, not the former. (See ?predict.lm for a terse explanation of the difference).
I probably should have questioned the poster about what was meant by
a "prediction interval for a monotonic loess curve". I was
suggesting quantile regression for estimation of a chosen quantile,
say the 90th percentile. I was thinking it could produce the
analogue of a 90th percentile value (with no reference to a
mean value or use of presumed distribution within adjacent windows
of say 100-150 points. I had experience using the cobs function (in
the package of the same name) as Koenker illustrates:
age <- runif(1000,min=60,max=85)
analyte <- rlnorm(1000,4*(age/60),age/60)
plot(age,analyte)
library(cobs)
library(quantreg)
Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9)
Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)
png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
lines(predict(Rbs.9), col = "red", lwd = 1.5)
lines(predict(Rbs.median), col = "blue", lwd = 1.5)
dev.off()
<Mail Attachment.png>
-- David
obtainable from bootstrapping but the details depend on what you are prepared to assume. Consult references or your local statistician for help if needed. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net <mailto:dwinsemius at comcast.net>> wrote:
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
<boot2ci_PI.png><cobs.png>
David Winsemius Alameda, CA, USA
I had that result sometimes when testing as well. You don't offer any code so there's nothing I can do to follow-up.
David.
On Aug 18, 2014, at 4:56 AM, Jan Stanstrup wrote:
> The knots are deleted anyway ("Deleting unnecessary knots ..."). It seems to make no difference.
>
>
>
> On 08/14/2014 06:06 PM, David Winsemius wrote:
>>
>> On Aug 14, 2014, at 7:17 AM, Jan Stanstrup wrote:
>>
>>> Thank you very much for this snippet!
>>>
>>> I used it on my data and indeed it does give intervals which appear quite realistic (script and data here https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R).
>>> I also tried getting the intervals with predict.cobs but the methods available there gave very narrow bands.
>>> The only problem I can see is that the fit tend to be a bit on the smooth side. See for example the upper interval limits at x = 2 to 3 and x =1.2. If then I set lambda to something low like 0.05 the band narrows to nearly nothing when there are few points. For example at x = 2.5. Is there some other parameter I would be adjusting?
>>>
>>
>> Try specifying the number and location of the knots (using my example data):
>>
>> > Rbs.9 <- cobs(age,analyte,constraint="increase",tau=0.9, nknots=6, knots=seq(60,85,by=5))
>> > plot(age,analyte, ylim=c(0,2000))
>> > lines(predict(Rbs.9), col = 2, lwd = 1.5)
>>
>> <Mail Attachment.png>
>>
>> --
>> David.
>>
>>>
>>>
>>> ----------------------
>>> Jan Stanstrup
>>> Postdoc
>>>
>>> Metabolomics
>>> Food Quality and Nutrition
>>> Fondazione Edmund Mach
>>>
>>>
>>>
>>> On 08/14/2014 02:02 AM, David Winsemius wrote:
>>>>
>>>> On Aug 12, 2014, at 8:40 AM, Bert Gunter wrote:
>>>>
>>>>> PI's of what? -- future individual values or mean values?
>>>>>
>>>>> I assume quantreg provides quantiles for the latter, not the former.
>>>>> (See ?predict.lm for a terse explanation of the difference).
>>>>
>>>> I probably should have questioned the poster about what was meant by a "prediction interval for a monotonic loess curve". I was suggesting quantile regression for estimation of a chosen quantile, say the 90th percentile. I was thinking it could produce the analogue of a 90th percentile value (with no reference to a mean value or use of presumed distribution within adjacent windows of say 100-150 points. I had experience using the cobs function (in the package of the same name) as Koenker illustrates:
>>>>
>>>> age <- runif(1000,min=60,max=85)
>>>>
>>>> analyte <- rlnorm(1000,4*(age/60),age/60)
>>>> plot(age,analyte)
>>>>
>>>> library(cobs)
>>>> library(quantreg)
>>>> Rbs.9 <- cobs(age,analyte, constraint="increase",tau=0.9)
>>>> Rbs.median <- cobs(age,analyte,constraint="increase",tau=0.5)
>>>>
>>>> png("cobs.png"); plot(age,analyte, ylim=c(0,2000))
>>>> lines(predict(Rbs.9), col = "red", lwd = 1.5)
>>>> lines(predict(Rbs.median), col = "blue", lwd = 1.5)
>>>> dev.off()
>>>> <Mail Attachment.png>
>>>>
>>>> -- David
>>>>
>>>>
>>>>> obtainable from bootstrapping but the details depend on what you are
>>>>> prepared to assume. Consult references or your local statistician for
>>>>> help if needed.
>>>>>
>>>>> -- Bert
>>>>>
>>>>> Bert Gunter
>>>>> Genentech Nonclinical Biostatistics
>>>>> (650) 467-7374
>>>>>
>>>>> "Data is not information. Information is not knowledge. And knowledge
>>>>> is certainly not wisdom."
>>>>> Clifford Stoll
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>>
>>>>>> On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping.
>>>>>>>
>>>>>>> At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value.
>>>>>>> I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the "fit" (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value.
>>>>>>>
>>>>>>> For linear models predict.lm can give PI instead of CI by setting interval = "prediction". Further discussion of that here:
>>>>>>> http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
>>>>>>> http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.
>>>>>>>
>>>>>>> However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it.
>>>>>>>
>>>>>>
>>>>>> Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there.
>>>>>> --
>>>>>>
>>>>>> David Winsemius
>>>>>> Alameda, CA, USA
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>>
>>> <boot2ci_PI.png><cobs.png>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>
David Winsemius
Alameda, CA, USA
Sorry. I have updated the code to have include the knot selection (https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). I am working on the "Good data" at the moment. - Jan.
On 08/18/2014 08:14 PM, David Winsemius wrote:
I had that result sometimes when testing as well. You don't offer any code so there's nothing I can do to follow-up.
And just then I realized the problem. nknots need to be length(knots).
Otherwise knots are deleted.
I am not so sure this works equally well as my original loess fit
though. The fit I get with cobs is highly dependent on the "knot step
size". At 0.4 for example it seems ok. At 0.3 I get points with
basically zero width intervals. At 0.2 it breaks ("There is at least one
pair of adjacent knots that contains no observation."). Not so great for
my use case where I need to build many models unsupervised.
- Jan.
On 08/19/2014 09:44 AM, Jan Stanstrup wrote:
Sorry. I have updated the code to have include the knot selection (https://github.com/stanstrup/retpred_shiny/blob/master/retdb_admin/make_predictions_CI_tests.R). I am working on the "Good data" at the moment. - Jan. On 08/18/2014 08:14 PM, David Winsemius wrote:
I had that result sometimes when testing as well. You don't offer any code so there's nothing I can do to follow-up.