An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121017/ea9c66ed/attachment.pl>
how R implement qnorm()
7 messages · Brian Ripley, Thomas Lumley, Peter Dalgaard +1 more
On 18/10/2012 00:16, Sheng Liu wrote:
how R implement qnorm() I wonder anyone knows the mathematical process that R calculated the quantile?
It's on the help page! 'For qnorm, the code is a C translation of Wichura, M. J. (1988) Algorithm AS 241: The Percentage Points of the Normal Distribution. Applied Statistics, 37, 477?484.' and the exact code is in the R sources. E.g. online at https://svn.r-project.org/R/trunk/src/nmath/qnorm.c
The reason I asked is soly by curiosity. I know the probability of a normal distribution is calculated through integrate the Gaussian function, which can be implemented easily (see code), while the calculation of quantile (or Z??) in R is a bit confusing as it requires inverse error function (X = - sqrt(2)* erf-1 (2*P)), while R doesn't have a build in one. The InvErf function most people use is through qnorm( InvErf=function(x)
I think you are wrong about 'most people': this is the notation used by a small group of non-statisticians (mainly physicists, I think).
qnorm((1+x)/2)/sqrt(2) ). When you type qnorm in the console, it doesn't show it as it is an internal function, I searched around can't found too much information, my hunch is R might be using some internal library that's in the chipset which can calculate erf-1(x), but it is not accessible to user.
No, FPUs do not commonly have inverse erf functions.
Any information is welcomed. thanks.
Sheng
code for implementation of pnorm()
--------------------------------------------------
p.Gaussian=function (z, mean=0,sd=1) {
Gaussian=function(x) {1/(sqrt(2*pi)*sd)*exp(-(x-mean)^2/(2*sd^2))}
per=integrate(Gaussian,lower=-Inf,upper=z)
return (per$value)
}
code for implementation of qnorm()
--------------------------------------------------
# I've figured out one that uses the uniroot function to get x, it
approximate qnorm() well but not exactly. I would be very happy to see the
implementation through a mathematical formula such as using the X = -
sqrt(2)* erf-1 (2*P), P is the probability).
q.Gaussian=function(p,mean=0,sd=1) {
variable = function(x) p.Gaussian(x)-p
z = uniroot(variable, interval=c(-4,4))
v = z$root*sd+mean
return(v)
}
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Oct 18, 2012, at 09:55 , Prof Brian Ripley wrote:
R is a bit confusing as it requires inverse error function (X = - sqrt(2)* erf-1 (2*P)), while R doesn't have a build in one. The InvErf function most people use is through qnorm( InvErf=function(x)
I think you are wrong about 'most people': this is the notation used by a small group of non-statisticians (mainly physicists, I think).
Well, he's right in the sense that it is erf/erfc that are commonly documented in collections of special functions (like Abramowitz & Stegun), and also those that appear in common C math libraries. In both cases of course because physicists have dominated in their development. Of course "most people" use Excel these days (which has the inverse normal distribution but AFAICS not the inverse error function).
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121018/f96f63c0/attachment.pl>
On Fri, Oct 19, 2012 at 12:21 PM, Sheng Liu <sheng.liu at live.ca> wrote:
Thanks a lot. It's very helpful. I've read through the c code. Just FYI and for my completion of the question, I post some of my thought on it: To me it looks like the algorithm is actually inquiring through an approximation table (the approximations, at least for pnom, is "derived from those in "Rational Chebyshev approximations for the error function" by W. J. Cody, Math. Comp., 1969, 631-637."), it is not using any function such as intergration or inverse erf to compute the value based on an equation as I thought, though lack a bit subtlety but the up side is the speed.
It's not an approximation table, it's a rational Chebyshev approximation, a ratio of polynomials. qnorm also uses a rational polynomial approximation. At some level this *has to be* what is going on: computers don't implement pnorm/qnorm or erfc in hardware, and there is no closed-form expression for them in terms of quantities that are implemented in hardware, so the functions must be some sort of approximate expression using arithmetic and other hardware computations.
From the point of view of R, the only relevant issues are precision,
speed, and portability, and these approximations are portable,
accurate, and fast.
-thomas
Hope this can help others who had similar questions as well. Sheng On Thu, Oct 18, 2012 at 2:32 AM, peter dalgaard <pdalgd at gmail.com> wrote:
On Oct 18, 2012, at 09:55 , Prof Brian Ripley wrote:
R is a bit confusing as it requires inverse error function (X = - sqrt(2)* erf-1 (2*P)), while R doesn't have a build in one. The InvErf function most people use is through qnorm( InvErf=function(x)
I think you are wrong about 'most people': this is the notation used by
a small group of non-statisticians (mainly physicists, I think). Well, he's right in the sense that it is erf/erfc that are commonly documented in collections of special functions (like Abramowitz & Stegun), and also those that appear in common C math libraries. In both cases of course because physicists have dominated in their development. Of course "most people" use Excel these days (which has the inverse normal distribution but AFAICS not the inverse error function). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Professor of Biostatistics University of Auckland
On Oct 19, 2012, at 06:05 , Thomas Lumley wrote:
On Fri, Oct 19, 2012 at 12:21 PM, Sheng Liu <sheng.liu at live.ca> wrote:
Thanks a lot. It's very helpful. I've read through the c code. Just FYI and for my completion of the question, I post some of my thought on it: To me it looks like the algorithm is actually inquiring through an approximation table (the approximations, at least for pnom, is "derived from those in "Rational Chebyshev approximations for the error function" by W. J. Cody, Math. Comp., 1969, 631-637."), it is not using any function such as intergration or inverse erf to compute the value based on an equation as I thought, though lack a bit subtlety but the up side is the speed.
It's not an approximation table, it's a rational Chebyshev approximation, a ratio of polynomials. qnorm also uses a rational polynomial approximation.
I inferred that Sheng probably knew that and meant that there's a table of coefficients of a rational polynomial approximation.
At some level this *has to be* what is going on: computers don't implement pnorm/qnorm or erfc in hardware, and there is no closed-form expression for them in terms of quantities that are implemented in hardware, so the functions must be some sort of approximate expression using arithmetic and other hardware computations.
Up to a few years ago, or maybe a decade, that was also the way special functions and even square roots and division were implemented in hardware using lookup tables and add/multiply circuits. All hell broke loose when one of the coefficients got entered incorrectly -- some may remember the FDIV Pentium bug of 1994. I used to have a really nice paper from Byte magazine c. 1990 which detailed the process, but I think it got lost in space. (L. Brett Glass: "Math Coprocessors" could be the one. http://dl.acm.org/citation.cfm?id=86680). Nowadays, we have single CPU cycle transcendentals, so I suppose that has all been reduced to hard-core optimized electronic circuitry. Fringe-market customers like statisticians still need to implement their functions in software, of course.
From the point of view of R, the only relevant issues are precision,
speed, and portability, and these approximations are portable, accurate, and fast. -thomas
Hope this can help others who had similar questions as well. Sheng On Thu, Oct 18, 2012 at 2:32 AM, peter dalgaard <pdalgd at gmail.com> wrote:
On Oct 18, 2012, at 09:55 , Prof Brian Ripley wrote:
R is a bit confusing as it requires inverse error function (X = - sqrt(2)* erf-1 (2*P)), while R doesn't have a build in one. The InvErf function most people use is through qnorm( InvErf=function(x)
I think you are wrong about 'most people': this is the notation used by
a small group of non-statisticians (mainly physicists, I think). Well, he's right in the sense that it is erf/erfc that are commonly documented in collections of special functions (like Abramowitz & Stegun), and also those that appear in common C math libraries. In both cases of course because physicists have dominated in their development. Of course "most people" use Excel these days (which has the inverse normal distribution but AFAICS not the inverse error function). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Thomas Lumley Professor of Biostatistics University of Auckland
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121019/6fe40126/attachment.pl>