Hello:
The example "integrate(dnorm,0,20000)" says it "fails on many
systems". I just got 0 from it, when I should have gotten either an
error or something close to 0.5. I got this with R 2.12.0 under both
Windows Vista_x64 and Linux (Fedora 13); see the results from Windows
below. I thought you might want to know.
Thanks for all your work in creating and maintaining R.
Best Wishes,
Spencer Graves
###############################
integrate(dnorm,0,20000) ## fails on many systems
0 with absolute error < 0
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
0.5 != integrate(dnorm,0,20000) = 0
13 messages · Spencer Graves, Brian Ripley, Baptiste Auguie +6 more
On Mon, 6 Dec 2010, Spencer Graves wrote:
Hello:
The example "integrate(dnorm,0,20000)" says it "fails on many systems".
I just got 0 from it, when I should have gotten either an error or something
close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
Linux (Fedora 13); see the results from Windows below. I thought you might
want to know.
Well, isn't that exactly what the help page says happens? That
example is part of a section entitled
## integrate can fail if misused
and is part of the illustration of
If the function is
approximately constant (in particular, zero) over nearly all its
range it is possible that the result and error estimate may be
seriously wrong.
Thanks for all your work in creating and maintaining R.
Best Wishes,
Spencer Graves
###############################
integrate(dnorm,0,20000) ## fails on many systems
0 with absolute error < 0
sessionInfo()
R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi, I was recently given some interesting tips on a similar issue, see R-help "puzzle with integrate over infinite range" <http://www.r-help.com/list/85/713882.html> Maybe "fails" can be a bit misleading here (fails to produce the actual result vs. returning an error message). As a result of this previous discussion, I don't think it would be possible to return an error: as far as the algorithm knows, the value it calculated is 0 because the integrand was 0 everywhere. To know better, the program would need to sample the integrand at more points (which can be achieved by changing the interval, or better, by setting the tolerance to a lower value). baptiste On 7 December 2010 00:32, Spencer Graves
<spencer.graves at structuremonitoring.com> wrote:
Hello: ? ? ?The example "integrate(dnorm,0,20000)" says it "fails on many systems". ?I just got 0 from it, when I should have gotten either an error or something close to 0.5. ?I got this with R 2.12.0 under both Windows Vista_x64 and Linux (Fedora 13); ?see the results from Windows below. ?I thought you might want to know. ? ? ?Thanks for all your work in creating and maintaining R. ? ? ?Best Wishes, ? ? ?Spencer Graves ############################### integrate(dnorm,0,20000) ## fails on many systems 0 with absolute error < 0
sessionInfo()
R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Prof Brian Ripley <ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course,
and the issue has been known for ``ages'' ..
..........
..........
but it seems that too many useRs are not reading the help
page carefully, but only browse it quickly.
I think we (R developers) have to live with this fact
and should consider adapting to it a bit more, particularly in
this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error < 0
and this is particularly unsatisfactory for another reason:
"absolute error < 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate <- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x) > epsilon
+
+ a <- lower; b <- upper
+ while ( (b-a>min.width) && (f(b)<epsilon) ) { b <- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error < 9.2e-05 This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon, etc. Setting the tolerances epsilon and min.width is an issue, but an explicit discussion of these values could encourage people to think about the problem in their specific case. And of course, none of this guarantees a correct answer, especially if someone tries this on non-monotonic integrands with complicated 0 sets. One could write a somewhat more user-friendly version where the user has to specify some property (or set of properties) of the integrand, e.g. "right-tail decreasing to 0", etc. and have the algorithm try to do smart trimming based on this. But perhaps this getting too involved. In the end, there is no general solution because any solution depends on the specific nature of the integrand. Clearer messages, warnings in suspicious cases like subdivisions==1, and a simple example explaining what the issue is in the help page would help some people. John ........................................................................... John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan ........................................................................... -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org, Prof Brian Ripley <ripley at stats.ox.ac.uk> From: Martin Maechler Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 03:29AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley <ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course,
and the issue has been known for ``ages'' ..
..........
..........
but it seems that too many useRs are not reading the help
page carefully, but only browse it quickly.
I think we (R developers) have to live with this fact
and should consider adapting to it a bit more, particularly in
this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error < 0
and this is particularly unsatisfactory for another reason:
"absolute error < 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20101207/b08969b1/attachment.pl>
Putting in Inf for the upper bound does not work in general: all 3 of the following should give 0.5
integrate( dnorm, 0, Inf )
0.5 with absolute error < 4.7e-05
integrate( dnorm, 0, Inf, sd=100000 )
Error in integrate(dnorm, 0, Inf, sd = 1e+05) : the integral is probably divergent
integrate( dnorm, 0, Inf, sd=10000000 )
5.570087e-05 with absolute error < 0.00010 Numerical quadrature methods look at a finite number of points, and you can find examples that will confuse any algorithm. Rather than hope a general method will solve all problems, users should look at their integrand and pick an appropriate region of integration. John Nolan, American U. -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org From: Pierre Chausse Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 09:46AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0 The warning about "absolute error == 0" would not be sufficient because if you do > integrate(dnorm, 0, 5000) 2.326323e-06 with absolute error < 4.6e-06 We get reasonable absolute error and wrong answer. For very high upper bound, it seems more stable to use "Inf". In that case, another .External is used which seems to be optimized for high or low bounds: > integrate(dnorm, 0,Inf) 0.5 with absolute error < 4.7e-05
On 10-12-07 8:38 AM, John Nolan wrote:
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x)> epsilon
+
+ a<- lower; b<- upper
+ while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error< 9.2e-05 This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon, etc. Setting the tolerances epsilon and min.width is an issue, but an explicit discussion of these values could encourage people to think about the problem in their specific case. And of course, none of this guarantees a correct answer, especially if someone tries this on non-monotonic integrands with complicated 0 sets. One could write a somewhat more user-friendly version where the user has to specify some property (or set of properties) of the integrand, e.g. "right-tail decreasing to 0", etc. and have the algorithm try to do smart trimming based on this. But perhaps this getting too involved. In the end, there is no general solution because any solution depends on the specific nature of the integrand. Clearer messages, warnings in suspicious cases like subdivisions==1, and a simple example explaining what the issue is in the help page would help some people. John ........................................................................... John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan ........................................................................... -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk> From: Martin Maechler Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 03:29AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley<ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course, and the issue has been known for ``ages'' .. .......... .......... but it seems that too many useRs are not reading the help page carefully, but only browse it quickly. I think we (R developers) have to live with this fact and should consider adapting to it a bit more, particularly in this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error< 0
and this is particularly unsatisfactory for another reason:
"absolute error< 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
*Pierre Chauss?* Assistant Professor Department of Economics University of Waterloo [[alternative HTML version deleted]] ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
What do you think about changing the verbiage with that example
in "integrate.Rd" from "fails on many systems" to something like
"gives wrong answer without warning on many systems"?
If I had write access to the core R code, I'd change this
myself: I'm probably not the only user who might think that saying
something "fails" suggest it gives an error message. Many contributions
on this thread make it clear that it will never be possible to write an
integrate function that won't give a "wrong answer without warning" in
some cases.
Thanks,
Spencer
#############################
On 12/7/2010 7:02 AM, John Nolan wrote:
Putting in Inf for the upper bound does not work in general: all 3 of the following should give 0.5
integrate( dnorm, 0, Inf )
0.5 with absolute error< 4.7e-05
integrate( dnorm, 0, Inf, sd=100000 )
Error in integrate(dnorm, 0, Inf, sd = 1e+05) : the integral is probably divergent
integrate( dnorm, 0, Inf, sd=10000000 )
5.570087e-05 with absolute error< 0.00010 Numerical quadrature methods look at a finite number of points, and you can find examples that will confuse any algorithm. Rather than hope a general method will solve all problems, users should look at their integrand and pick an appropriate region of integration. John Nolan, American U. -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org From: Pierre Chausse Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 09:46AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0 The warning about "absolute error == 0" would not be sufficient because if you do
> integrate(dnorm, 0, 5000)
2.326323e-06 with absolute error< 4.6e-06 We get reasonable absolute error and wrong answer. For very high upper bound, it seems more stable to use "Inf". In that case, another .External is used which seems to be optimized for high or low bounds:
> integrate(dnorm, 0,Inf)
0.5 with absolute error< 4.7e-05 On 10-12-07 8:38 AM, John Nolan wrote:
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x)> epsilon
+
+ a<- lower; b<- upper
+ while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error< 9.2e-05 This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon, etc. Setting the tolerances epsilon and min.width is an issue, but an explicit discussion of these values could encourage people to think about the problem in their specific case. And of course, none of this guarantees a correct answer, especially if someone tries this on non-monotonic integrands with complicated 0 sets. One could write a somewhat more user-friendly version where the user has to specify some property (or set of properties) of the integrand, e.g. "right-tail decreasing to 0", etc. and have the algorithm try to do smart trimming based on this. But perhaps this getting too involved. In the end, there is no general solution because any solution depends on the specific nature of the integrand. Clearer messages, warnings in suspicious cases like subdivisions==1, and a simple example explaining what the issue is in the help page would help some people. John ........................................................................... John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan ........................................................................... -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk> From: Martin Maechler Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 03:29AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley<ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course, and the issue has been known for ``ages'' .. .......... .......... but it seems that too many useRs are not reading the help page carefully, but only browse it quickly. I think we (R developers) have to live with this fact and should consider adapting to it a bit more, particularly in this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error< 0
and this is particularly unsatisfactory for another reason:
"absolute error< 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
R developers understand intimately how things work, and terse
descriptions are sufficient. However, most typical R users
would benefit from clearer documentation. In multiple places
I've found the R documentation to be correct and understandable
AFTER I've figured a function out.
And to be fair, this problem with integrate( ) isn't really R's
fault: the QUADPACK routines that R uses are very good algorithms,
but neither they nor any other package can handle all cases.
I would support reasonable changes in the documentation for
integrate( ). Just saying it "gives wrong answer without
warning on many systems" seems misleading (it works fine in
many cases) and it doesn't help a user understand how to use
integrate( ) correctly/carefully. IMO a simple example like
this one w/ dnorm would catch peoples attention and a couple
lines of explanation/warning would then make more sense.
John Nolan, American U
-----Spencer Graves <spencer.graves at structuremonitoring.com> wrote: -----
To: John Nolan <jpnolan at american.edu>
From: Spencer Graves <spencer.graves at structuremonitoring.com>
Date: 12/07/2010 07:58PM
Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0)
What do you think about changing the verbiage with that example
in "integrate.Rd" from "fails on many systems" to something like
"gives wrong answer without warning on many systems"?
If I had write access to the core R code, I'd change this
myself: I'm probably not the only user who might think that saying
something "fails" suggest it gives an error message. Many contributions
on this thread make it clear that it will never be possible to write an
integrate function that won't give a "wrong answer without warning" in
some cases.
Thanks,
Spencer
#############################
On 12/7/2010 7:02 AM, John Nolan wrote:
Putting in Inf for the upper bound does not work in general: all 3 of the following should give 0.5
integrate( dnorm, 0, Inf )
0.5 with absolute error< 4.7e-05
integrate( dnorm, 0, Inf, sd=100000 )
Error in integrate(dnorm, 0, Inf, sd = 1e+05) : the integral is probably divergent
integrate( dnorm, 0, Inf, sd=10000000 )
5.570087e-05 with absolute error< 0.00010 Numerical quadrature methods look at a finite number of points, and you can find examples that will confuse any algorithm. Rather than hope a general method will solve all problems, users should look at their integrand and pick an appropriate region of integration. John Nolan, American U. -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org From: Pierre Chausse Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 09:46AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0 The warning about "absolute error == 0" would not be sufficient because if you do
> integrate(dnorm, 0, 5000)
2.326323e-06 with absolute error< 4.6e-06 We get reasonable absolute error and wrong answer. For very high upper bound, it seems more stable to use "Inf". In that case, another .External is used which seems to be optimized for high or low bounds:
> integrate(dnorm, 0,Inf)
0.5 with absolute error< 4.7e-05 On 10-12-07 8:38 AM, John Nolan wrote:
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x)> epsilon
+
+ a<- lower; b<- upper
+ while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error< 9.2e-05 This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon, etc. Setting the tolerances epsilon and min.width is an issue, but an explicit discussion of these values could encourage people to think about the problem in their specific case. And of course, none of this guarantees a correct answer, especially if someone tries this on non-monotonic integrands with complicated 0 sets. One could write a somewhat more user-friendly version where the user has to specify some property (or set of properties) of the integrand, e.g. "right-tail decreasing to 0", etc. and have the algorithm try to do smart trimming based on this. But perhaps this getting too involved. In the end, there is no general solution because any solution depends on the specific nature of the integrand. Clearer messages, warnings in suspicious cases like subdivisions==1, and a simple example explaining what the issue is in the help page would help some people. John ........................................................................... John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan ........................................................................... -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk> From: Martin Maechler Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 03:29AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley<ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course, and the issue has been known for ``ages'' .. .......... .......... but it seems that too many useRs are not reading the help page carefully, but only browse it quickly. I think we (R developers) have to live with this fact and should consider adapting to it a bit more, particularly in this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error< 0
and this is particularly unsatisfactory for another reason:
"absolute error< 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
If changes are to be made to integrate it would be nice if the following was fixed: > integrate(dnorm, -Inf, -Inf) 1 with absolute error < 9.4e-05 Note that integrate manages ok when not dealing with Inf or -Inf: > integrate(dnorm, -500, -500) 0 with absolute error < 0 David Scott
On 8/12/2010 5:08 p.m., John Nolan wrote:
R developers understand intimately how things work, and terse
descriptions are sufficient. However, most typical R users
would benefit from clearer documentation. In multiple places
I've found the R documentation to be correct and understandable
AFTER I've figured a function out.
And to be fair, this problem with integrate( ) isn't really R's
fault: the QUADPACK routines that R uses are very good algorithms,
but neither they nor any other package can handle all cases.
I would support reasonable changes in the documentation for
integrate( ). Just saying it "gives wrong answer without
warning on many systems" seems misleading (it works fine in
many cases) and it doesn't help a user understand how to use
integrate( ) correctly/carefully. IMO a simple example like
this one w/ dnorm would catch peoples attention and a couple
lines of explanation/warning would then make more sense.
John Nolan, American U
-----Spencer Graves<spencer.graves at structuremonitoring.com> wrote: -----
To: John Nolan<jpnolan at american.edu>
From: Spencer Graves<spencer.graves at structuremonitoring.com>
Date: 12/07/2010 07:58PM
Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0)
What do you think about changing the verbiage with that example
in "integrate.Rd" from "fails on many systems" to something like
"gives wrong answer without warning on many systems"?
If I had write access to the core R code, I'd change this
myself: I'm probably not the only user who might think that saying
something "fails" suggest it gives an error message. Many contributions
on this thread make it clear that it will never be possible to write an
integrate function that won't give a "wrong answer without warning" in
some cases.
Thanks,
Spencer
#############################
On 12/7/2010 7:02 AM, John Nolan wrote:
Putting in Inf for the upper bound does not work in general: all 3 of the following should give 0.5
integrate( dnorm, 0, Inf )
0.5 with absolute error< 4.7e-05
integrate( dnorm, 0, Inf, sd=100000 )
Error in integrate(dnorm, 0, Inf, sd = 1e+05) :
the integral is probably divergent
integrate( dnorm, 0, Inf, sd=10000000 )
5.570087e-05 with absolute error< 0.00010
Numerical quadrature methods look at a finite number of
points, and you can find examples that will confuse any
algorithm. Rather than hope a general method will solve
all problems, users should look at their integrand and
pick an appropriate region of integration.
John Nolan, American U.
-----r-devel-bounces at r-project.org wrote: -----
To: r-devel at r-project.org
From: Pierre Chausse
Sent by: r-devel-bounces at r-project.org
Date: 12/07/2010 09:46AM
Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
The warning about "absolute error == 0" would not be sufficient
because if you do
> integrate(dnorm, 0, 5000)
2.326323e-06 with absolute error< 4.6e-06 We get reasonable absolute error and wrong answer. For very high upper bound, it seems more stable to use "Inf". In that case, another .External is used which seems to be optimized for high or low bounds:
> integrate(dnorm, 0,Inf)
0.5 with absolute error< 4.7e-05 On 10-12-07 8:38 AM, John Nolan wrote:
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x)> epsilon
+
+ a<- lower; b<- upper
+ while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error< 9.2e-05
This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon,
etc. Setting the tolerances epsilon and min.width is an issue,
but an explicit discussion of these values could encourage people to
think about the problem in their specific case. And of course, none
of this guarantees a correct answer, especially if someone tries this
on non-monotonic integrands with complicated 0 sets. One could write
a somewhat more user-friendly version where the user has to specify
some property (or set of properties) of the integrand, e.g. "right-tail
decreasing to 0", etc. and have the algorithm try to do smart
trimming based on this. But perhaps this getting too involved.
In the end, there is no general solution because any solution
depends on the specific nature of the integrand. Clearer messages,
warnings in suspicious cases like subdivisions==1, and a simple
example explaining what the issue is in the help page would help
some people.
John
...........................................................................
John P. Nolan
Math/Stat Department
227 Gray Hall
American University
4400 Massachusetts Avenue, NW
Washington, DC 20016-8050
jpnolan at american.edu
202.885.3140 voice
202.885.3155 fax
http://academic2.american.edu/~jpnolan
...........................................................................
-----r-devel-bounces at r-project.org wrote: -----
To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk>
From: Martin Maechler
Sent by: r-devel-bounces at r-project.org
Date: 12/07/2010 03:29AM
Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley<ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course, and the issue has been known for ``ages'' .. .......... .......... but it seems that too many useRs are not reading the help page carefully, but only browse it quickly. I think we (R developers) have to live with this fact and should consider adapting to it a bit more, particularly in this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error< 0
and this is particularly unsatisfactory for another reason:
"absolute error< 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
_________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics
Hi, John: Maybe change it to something like "gives wrong answer without warning on many systems (see 'Note' above)", as the 'Note' does provide more detail. Thanks, Spencer
On 12/7/2010 8:08 PM, John Nolan wrote:
R developers understand intimately how things work, and terse
descriptions are sufficient. However, most typical R users
would benefit from clearer documentation. In multiple places
I've found the R documentation to be correct and understandable
AFTER I've figured a function out.
And to be fair, this problem with integrate( ) isn't really R's
fault: the QUADPACK routines that R uses are very good algorithms,
but neither they nor any other package can handle all cases.
I would support reasonable changes in the documentation for
integrate( ). Just saying it "gives wrong answer without
warning on many systems" seems misleading (it works fine in
many cases) and it doesn't help a user understand how to use
integrate( ) correctly/carefully. IMO a simple example like
this one w/ dnorm would catch peoples attention and a couple
lines of explanation/warning would then make more sense.
John Nolan, American U
-----Spencer Graves<spencer.graves at structuremonitoring.com> wrote: -----
To: John Nolan<jpnolan at american.edu>
From: Spencer Graves<spencer.graves at structuremonitoring.com>
Date: 12/07/2010 07:58PM
Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0)
What do you think about changing the verbiage with that example
in "integrate.Rd" from "fails on many systems" to something like
"gives wrong answer without warning on many systems"?
If I had write access to the core R code, I'd change this
myself: I'm probably not the only user who might think that saying
something "fails" suggest it gives an error message. Many contributions
on this thread make it clear that it will never be possible to write an
integrate function that won't give a "wrong answer without warning" in
some cases.
Thanks,
Spencer
#############################
On 12/7/2010 7:02 AM, John Nolan wrote:
Putting in Inf for the upper bound does not work in general: all 3 of the following should give 0.5
integrate( dnorm, 0, Inf )
0.5 with absolute error< 4.7e-05
integrate( dnorm, 0, Inf, sd=100000 )
Error in integrate(dnorm, 0, Inf, sd = 1e+05) :
the integral is probably divergent
integrate( dnorm, 0, Inf, sd=10000000 )
5.570087e-05 with absolute error< 0.00010
Numerical quadrature methods look at a finite number of
points, and you can find examples that will confuse any
algorithm. Rather than hope a general method will solve
all problems, users should look at their integrand and
pick an appropriate region of integration.
John Nolan, American U.
-----r-devel-bounces at r-project.org wrote: -----
To: r-devel at r-project.org
From: Pierre Chausse
Sent by: r-devel-bounces at r-project.org
Date: 12/07/2010 09:46AM
Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
The warning about "absolute error == 0" would not be sufficient
because if you do
> integrate(dnorm, 0, 5000)
2.326323e-06 with absolute error< 4.6e-06 We get reasonable absolute error and wrong answer. For very high upper bound, it seems more stable to use "Inf". In that case, another .External is used which seems to be optimized for high or low bounds:
> integrate(dnorm, 0,Inf)
0.5 with absolute error< 4.7e-05 On 10-12-07 8:38 AM, John Nolan wrote:
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x)> epsilon
+
+ a<- lower; b<- upper
+ while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error< 9.2e-05
This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon,
etc. Setting the tolerances epsilon and min.width is an issue,
but an explicit discussion of these values could encourage people to
think about the problem in their specific case. And of course, none
of this guarantees a correct answer, especially if someone tries this
on non-monotonic integrands with complicated 0 sets. One could write
a somewhat more user-friendly version where the user has to specify
some property (or set of properties) of the integrand, e.g. "right-tail
decreasing to 0", etc. and have the algorithm try to do smart
trimming based on this. But perhaps this getting too involved.
In the end, there is no general solution because any solution
depends on the specific nature of the integrand. Clearer messages,
warnings in suspicious cases like subdivisions==1, and a simple
example explaining what the issue is in the help page would help
some people.
John
...........................................................................
John P. Nolan
Math/Stat Department
227 Gray Hall
American University
4400 Massachusetts Avenue, NW
Washington, DC 20016-8050
jpnolan at american.edu
202.885.3140 voice
202.885.3155 fax
http://academic2.american.edu/~jpnolan
...........................................................................
-----r-devel-bounces at r-project.org wrote: -----
To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk>
From: Martin Maechler
Sent by: r-devel-bounces at r-project.org
Date: 12/07/2010 03:29AM
Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley<ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many systems".
>> I just got 0 from it, when I should have gotten either an error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course, and the issue has been known for ``ages'' .. .......... .......... but it seems that too many useRs are not reading the help page carefully, but only browse it quickly. I think we (R developers) have to live with this fact and should consider adapting to it a bit more, particularly in this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error< 0
and this is particularly unsatisfactory for another reason:
"absolute error< 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi,
My honest and (not so) humble opinion is that no amount of clear and
explicit warning can totally prevent the inappropriate use of any tool.
Users will continue to use the tools, without doing the necessary background
work to figure out whether the that tool is the appropriate one for their
particular problem. If things can go so horribly wrong in such a simple
case, imagine all the snares and traps present in complex, high-dimensional
integration. Even the best cubature rules or the MCMC methods can give
wrong results. Even worse, how in heaven's name can we be sure that the
answer is any good? The simple and best solution is to understand your
integrand as best as you can. I realize that this may be viewed as being
too pedantic, but unfortunately, it is also the best advice.
Best,
Ravi.
-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu
-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On Behalf Of John Nolan
Sent: Tuesday, December 07, 2010 11:09 PM
To: spencer.graves at structuremonitoring.com
Cc: r-devel at r-project.org
Subject: Re: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !=
integrate(dnorm, 0, 20000) = 0)
R developers understand intimately how things work, and terse
descriptions are sufficient. However, most typical R users
would benefit from clearer documentation. In multiple places
I've found the R documentation to be correct and understandable
AFTER I've figured a function out.
And to be fair, this problem with integrate( ) isn't really R's
fault: the QUADPACK routines that R uses are very good algorithms,
but neither they nor any other package can handle all cases.
I would support reasonable changes in the documentation for
integrate( ). Just saying it "gives wrong answer without
warning on many systems" seems misleading (it works fine in
many cases) and it doesn't help a user understand how to use
integrate( ) correctly/carefully. IMO a simple example like
this one w/ dnorm would catch peoples attention and a couple
lines of explanation/warning would then make more sense.
John Nolan, American U
-----Spencer Graves <spencer.graves at structuremonitoring.com> wrote: -----
To: John Nolan <jpnolan at american.edu>
From: Spencer Graves <spencer.graves at structuremonitoring.com>
Date: 12/07/2010 07:58PM
Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 !=
integrate(dnorm,0,20000) = 0)
What do you think about changing the verbiage with that example
in "integrate.Rd" from "fails on many systems" to something like
"gives wrong answer without warning on many systems"?
If I had write access to the core R code, I'd change this
myself: I'm probably not the only user who might think that saying
something "fails" suggest it gives an error message. Many contributions
on this thread make it clear that it will never be possible to write an
integrate function that won't give a "wrong answer without warning" in
some cases.
Thanks,
Spencer
#############################
On 12/7/2010 7:02 AM, John Nolan wrote:
Putting in Inf for the upper bound does not work in general: all 3 of the following should give 0.5
integrate( dnorm, 0, Inf )
0.5 with absolute error< 4.7e-05
integrate( dnorm, 0, Inf, sd=100000 )
Error in integrate(dnorm, 0, Inf, sd = 1e+05) : the integral is probably divergent
integrate( dnorm, 0, Inf, sd=10000000 )
5.570087e-05 with absolute error< 0.00010 Numerical quadrature methods look at a finite number of points, and you can find examples that will confuse any algorithm. Rather than hope a general method will solve all problems, users should look at their integrand and pick an appropriate region of integration. John Nolan, American U. -----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org From: Pierre Chausse Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 09:46AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0 The warning about "absolute error == 0" would not be sufficient because if you do
> integrate(dnorm, 0, 5000)
2.326323e-06 with absolute error< 4.6e-06 We get reasonable absolute error and wrong answer. For very high upper bound, it seems more stable to use "Inf". In that case, another .External is used which seems to be optimized for high or low bounds:
> integrate(dnorm, 0,Inf)
0.5 with absolute error< 4.7e-05 On 10-12-07 8:38 AM, John Nolan wrote:
I have wrestled with this problem before. I think correcting the warning to "absolute error ~<= 0" is a good idea, and printing a warning if subdivisions==1 is also helpful. Also, including a simple example like the one that started this thread on the help page for integrate might make the issue more clear to users. But min.subdivisions is probably not. On the example with dnorm( ), I doubt 3 subdivisions would work. The problem isn't that we aren't sudividing enough, the problem is that the integrand is 0 (in double precision) on most of the region and the algorithm isn't designed to handle this. There is no way to determine how many subdivisions are necessary to get a reasonable answer without a detailed analysis of the integrand. I've gotten useful results with integrands that are monotonic on the tail with a "self-triming integration" routine like the following:
right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100,
min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x)> epsilon
+
+ a<- lower; b<- upper
+ while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
+
+ return( integrate(f,a,b,...) ) }
right.trimmed.integrate( dnorm, 0, 20000 ) # test
0.5 with absolute error< 9.2e-05 This can be adapted to left trim or (left and right) trim,
abs(f(x)-c)>epsilon,
etc. Setting the tolerances epsilon and min.width is an issue, but an explicit discussion of these values could encourage people to think about the problem in their specific case. And of course, none of this guarantees a correct answer, especially if someone tries this on non-monotonic integrands with complicated 0 sets. One could write a somewhat more user-friendly version where the user has to specify some property (or set of properties) of the integrand, e.g. "right-tail decreasing to 0", etc. and have the algorithm try to do smart trimming based on this. But perhaps this getting too involved. In the end, there is no general solution because any solution depends on the specific nature of the integrand. Clearer messages, warnings in suspicious cases like subdivisions==1, and a simple example explaining what the issue is in the help page would help some people. John
...........................................................................
John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan
...........................................................................
-----r-devel-bounces at r-project.org wrote: ----- To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk> From: Martin Maechler Sent by: r-devel-bounces at r-project.org Date: 12/07/2010 03:29AM Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
Prof Brian Ripley<ripley at stats.ox.ac.uk>
on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
> On Mon, 6 Dec 2010, Spencer Graves wrote:
>> Hello:
>>
>>
>> The example "integrate(dnorm,0,20000)" says it "fails on many
systems".
>> I just got 0 from it, when I should have gotten either an
error or something
>> close to 0.5. I got this with R 2.12.0 under both Windows
Vista_x64 and
>> Linux (Fedora 13); see the results from Windows below. I
thought you might
>> want to know.
> Well, isn't that exactly what the help page says happens? That
> example is part of a section entitled
> ## integrate can fail if misused
> and is part of the illustration of
> If the function is
> approximately constant (in particular, zero) over nearly all
its
> range it is possible that the result and error estimate may be
> seriously wrong.
yes, of course, and the issue has been known for ``ages'' .. .......... .......... but it seems that too many useRs are not reading the help page carefully, but only browse it quickly. I think we (R developers) have to live with this fact and should consider adapting to it a bit more, particularly in this case (see below)
>>
>> Thanks for all your work in creating and maintaining R.
>>
>>
>> Best Wishes,
>> Spencer Graves
>> ###############################
>>
>> integrate(dnorm,0,20000) ## fails on many systems
>> 0 with absolute error< 0
and this is particularly unsatisfactory for another reason:
"absolute error< 0"
is *always* incorrect, so I think we should change *some*thing.
We could just use "<=" (and probably should in any case, or
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.
But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.
If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
.warn.if.doubtful = TRUE
An additional possibility I'd like to try, is a new argument
'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
'subdivisions = 100' (= the maximum number of subintervals.)
Martin Maechler,
ETH Zurich
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
5 days later
I'd suggest that the original sin here is calling some particular numerical integration routine 'integrate', which gives the user an illusory sense of power.... Functions have to be well-behaved in various ways for quadrature to work well, and you've got to expect things like
integrate(function(x)tan(x),0,pi)
Error in integrate(function(x) tan(x), 0, pi) : roundoff error is detected in the extrapolation table <<< a 'good' error -- tells the user something's wrong
integrate(function(x)tan(x)^2,0,pi)
1751.054 with absolute error < 0 <<< oops
integrate(function(x)1/(x-pi/2)^2,0,pi) <<< the same pole (analytically)
Error in integrate(function(x) 1/(x - pi/2)^2, 0, pi) : <<<
gets a useful error in this form
non-finite function value
But by that argument, I suppose you shouldn't call floating-point
addition "+" :-)
-s