Skip to content

the quantile function and problems.

7 messages · Uwe Brauer, Ivan Krylov, Ebert,Timothy Aaron +1 more

#
Hi 

I am very acquainted with R. I use it occasionally via the org-babel library of GNU emacs.

I wanted to check the first, second and third quartiles of the scientific science index JCR
https://support.clarivate.com/ScientificandAcademicResearch/s/article/Journal-Citation-Reports-Quartile-rankings-and-other-metrics?language=en_U
S 
Its criterion is 
#+begin_src 
| Quartil | range             |                                       |
| ---------+------------------+---------------------------------------|
| Q1      | 0.0 < Z \leq 0.25 | Highest ranked journals in a category |
| Q2      | 0.25 < Z \leq 0.5 |                                       |
| Q3      | 0.5 < Z \leq 0.75 |                                       |
| Q4      | 0.75 < Z          | Lowest ranked journals in a category  |
#+end_src

Z=(X/Y)

Where X is the journal rank in category and Y is the number of journals in the category.

Now I have a list of 267 journals.

What turns me crazy is that the way R, matlab and the JCR calculate the quartiles gives different results.

Here is a table 
#+begin_matlab :exports both :eval never-export :results output latex
#+RESULTS:
| quartil-limit (last member) |    | floor_Rlang | jcr | jcr_check | floor_check |
|-----------------------------+----+-------------+-----+-----------+-------------|
|                        67.5 | Q1 |          67 |  66 |    0.2472 |      0.2509 |
|                         134 | Q2 |         134 | 133 |    0.4981 |      0.5019 |
|                       200.5 | Q3 |         200 | 200 |    0.7491 |      0.7491 |
|                         267 |    |         267 | 267 |         1 |           1 |
#+TBLFM: $5=$4/267::$6=$3/267
#+end_matlab

I calculated using R (I don't provide the vector from 1 to 267)

#+begin_src R :colnames t :var t1=jcr22
  quantile(t1$Data,c(1/4,1/2,3/4,1))
#+end_src
#+begin_src 
#+RESULTS:
|     x |
|-------|
|  67.5 |
|   134 |
| 200.5 |
|   267 |
#+end_src


So you see the problem with Q1 and Q2.

On top of that matlab gives

#+begin_src matlab :exports results :eval never-export :results output latex
format short
x=1:267;
q1 = quantile(x,1/4);
q2 = quantile(x,1/2);
q3 = quantile(x,3/4);
Q=[q1; q2; q3];
sprintf('|%g|   \n', Q)
#+end_src

#+RESULTS:
#+begin_export latex
|67.25|   
|134|   
|200.75|   
#+end_export

Which is also slightly different from R.

Can anybody enlighten me please?
Thanks and regards 

Uwe Brauer
#
Read ?quantile carefully, please (and any references therein that you
may wish to consult).

You are estimating a continuous function by a discrete finite step
function, and as the Help page (and further references) explains,
there are many ways to do this.

Bert
On Thu, Jul 14, 2022 at 2:33 PM Uwe Brauer <oub at mat.ucm.es> wrote:
#
Ok, thanks
#
? Thu, 14 Jul 2022 14:58:17 +0200
Uwe Brauer <oub at mat.ucm.es> ?????:
R by itself can give up to 9 slightly different results:

sapply(1:9, function(type) quantile(1:267, 1:3/4, type = type))
#     [,1] [,2] [,3]   [,4]   [,5] [,6]  [,7]      [,8]     [,9]
# 25%   67   67   67  66.75  67.25   67  67.5  67.16667  67.1875
# 50%  134  134  134 133.50 134.00  134 134.0 134.00000 134.0000
# 75%  201  201  200 200.25 200.75  201 200.5 200.83333 200.8125

Choose the ones that fit your ideas of quantile best. See ?quantile for
more info.
#
Does the choice in how the quantile is calculated influence the validity of a statistical test for differences in the median?
Regards,
Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Ivan Krylov
Sent: Friday, July 15, 2022 2:21 AM
To: Uwe Brauer <oub at mat.ucm.es>
Cc: r-help at r-project.org
Subject: Re: [R] the quantile function and problems.

[External Email]

? Thu, 14 Jul 2022 14:58:17 +0200
Uwe Brauer <oub at mat.ucm.es> ?????:
R by itself can give up to 9 slightly different results:

sapply(1:9, function(type) quantile(1:267, 1:3/4, type = type))
#     [,1] [,2] [,3]   [,4]   [,5] [,6]  [,7]      [,8]     [,9]
# 25%   67   67   67  66.75  67.25   67  67.5  67.16667  67.1875
# 50%  134  134  134 133.50 134.00  134 134.0 134.00000 134.0000 # 75%  201  201  200 200.25 200.75  201 200.5 200.83333 200.8125

Choose the ones that fit your ideas of quantile best. See ?quantile for more info.

--
Best regards,
Ivan

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=27y7tm3H4Dbasv87iGgaE8-1NMw7x-Skvgftsip0_qqJw5CpboxBiPDg8Hggx-xs&s=dYjPPV9FhTSt-KjJZLi6AXh_PNykZSwIhfrk2E4saXw&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=27y7tm3H4Dbasv87iGgaE8-1NMw7x-Skvgftsip0_qqJw5CpboxBiPDg8Hggx-xs&s=QJ2jSyuRERB_vwvrBw6zUnAUgY6Bals14F_ODjiqPcA&e=
and provide commented, minimal, self-contained, reproducible code.
#
This is now wandering from R-Help to statistical issues, which are
slightly off topic.

But the answer should be no, as the tests are calculated from the
underlying data, not quantile estimates.


Cheers,
Bert
On Fri, Jul 15, 2022 at 8:55 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
#
Thanks, number 3 is the best for my purpose. 

Very useful