Skip to content
Prev 172579 / 398525 Next

Diff btw percentile and quantile

Excel 2003's help for percentile just says it interpolates
between the quantiles in the data:  
   Array   is the array or range of data that defines relative standing.
   K   is the percentile value in the range 0..1, inclusive.
     If array is empty or contains more than 8,191 data points,
        PERCENTILE returns the #NUM! error value. 
     If k is nonnumeric, PERCENTILE returns the #VALUE! error value. 
     If k is < 0 or if k > 1, PERCENTILE returns the #NUM! error value. 
     If k is not a multiple of 1/(n - 1), PERCENTILE interpolates
        to determine the value at the k-th percentile. 
so some experimenation is on order.

I found that the call to R's quantile gives a different result
for each of the 9 documented values of the type argument:
   x<-c(1,1,2,3,3,5,8,8,9,10)
   quantile(x, probs=(0:8)/8, type=types[i])
E.g.,
sapply(1:9,function(type)quantile(x=x,probs=(0:8)/8,type=type))
      type=1 type=2 type=3 type=4 type=5 type=6 type=7    type=8
type=9
0%         1      1      1   1.00   1.00  1.000  1.000  1.000000
1.00000
12.5%      1      1      1   1.00   1.00  1.000  1.125  1.000000
1.00000
25%        2      2      1   1.50   2.00  1.750  2.250  1.916667
1.93750
37.5%      3      3      3   2.75   3.00  3.000  3.000  3.000000
3.00000
50%        3      4      3   3.00   4.00  4.000  4.000  4.000000
4.00000
62.5%      8      8      5   5.75   7.25  7.625  6.875  7.375000
7.34375
75%        8      8      8   8.00   8.00  8.250  8.000  8.083333
8.06250
87.5%      9      9      9   8.75   9.25  9.625  8.875  9.375000
9.34375
100%      10     10     10  10.00  10.00 10.000 10.000 10.000000
10.00000

I entered the same x into Excel 2003 and used the formulae
=percentile(A1:10,0),
=percentile(A1:A10,.125), ..., =percentile(A1:A10,1) and got the results
   1, 1.125, 2.25, 3, 4, 6.875, 8, 8.875, 10
This matches only R's type 7, the default.

They also match S+'s default quantile calculation.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

----------------------------
Ted Harding wrote:
I have now googled around a bit. All references to the Excel
percentile() function say that you feed it the fractional value
corresponding to the percentage. So, for example, to get the
80-th percentile you would give it 0.8. Hence Excel should call
it "quantile"!

As to the algorithm, Wikipedia states the following (translated
into R syntax):

  Many software packages, such as Microsoft Excel, use the
  following method recommended by NIST[4] to estimate the
  value, vp, of the pth percentile of an ascending ordered
  dataset containing N elements with values v[1],v[2],...,v[N]:

    n = (p/100)*(N-1) + 1

  n is then split into its integer component, k and decimal
  component, d, such that n = k + d.
  If k = 1, then the value for that percentile, vp, is the
  first member of the ordered dataset, v[1].
  If k = N, then the value for that percentile, vp, is the
  Nth member of the ordered dataset, v[N].
  Otherwise, 1 < k < N and vp = v[k] + d*(v[k + 1] - v[k]).

Note that the Wikipedia article uses the "%" interpretation of
"p-th percentile", i.e. the point which is (p/100) of the way
along the distribution.

It looks as though R's quantile with type=4 might be the same,
since it is explained as "linear interpolation of the empirical
cdf", which is what the above description of Excel's method does.
However, R's default type is 7, which is different.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09                                       Time: 17:29:50
------------------------------ XFMail ------------------------------