Skip to content

standard error for quantile

8 messages · Bert Gunter, (Ted Harding), Jim Lemon +1 more

#
Dear all

I have a question about quantiles standard error, partly practical
partly theoretical. I know that

x<-rlnorm(100000, log(200), log(2))
quantile(x, c(.10,.5,.99))

computes quantiles but I would like to know if there is any function to
find standard error (or any dispersion measure) of these estimated
values.

And here is a theoretical one. I feel that when I compute median from
given set of values it will have lower standard error then 0.1 quantile
computed from the same set of values.

Is it true? If yes can you point me to some reasoning?

Thanks for all answers.
Regards
Petr

PS.
I found mcmcse package which shall compute the standard error but which
I could not make to work probably because I do not have recent R-devel
version installed

Error in eval(expr, envir, enclos) :
  could not find function ".getNamespace"
Error : unable to load R code in package 'mcmcse'
Error: package/namespace load failed for 'mcmcse'

Maybe I will also something find in quantreg package, but I did not
went through it yet.
#
Petr:

1. Not an R question.

2. You want the distribution of order statistics. Search on that. It's
basically binomial/beta.

-- Bert
On Tue, Oct 30, 2012 at 6:46 AM, PIKAL Petr <petr.pikal at precheza.cz> wrote:

  
    
#
On 30-Oct-2012 13:46:17 PIKAL Petr wrote:
The general asymptotic result for the pth quantile (0<p<1) X.p of a
sample of size n is that it is asymptotically Normally distributed
with mean the pth quantile Q.p of the parent distribution and

  var(X.p) = p*(1-p)/(n*f(Q.p)^2)

where f(x) is the probability density function of the parent distribution.

This is not necessarily very helpful for small sample sizes (depending
on the parent distribution).

However, it is possible to obtain a general result giving an exact
confidence interval for Q.p given the entire ordered sample, though
there is only a restricted set of confidence levels to which it applies.

If you'd like more detail about the above, I could write up derivations
and make the write-up available.

Hoping this helps,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 30-Oct-2012  Time: 17:40:55
This message was sent by XFMail
#
On 10/31/2012 12:46 AM, PIKAL Petr wrote:
Hi Petr,
Using a resampling method, it depends upon the distribution of the 
values. If you have a "love-hate" distribution (bimodal and heavily 
weighted toward extreme values), the median standard error can be 
larger. Try this:

x<-sample(-5:5,1000,TRUE,
  prob=c(0.2,0.1,0.05,0.04,0.03,0.02,0.03,0.04,0.05,0.1,0.2))
x<-ifelse(x<0,x+runif(1000),x-runif(1000))
hist(x)
mcse.q(x, 0.1)
$est
[1] -3.481419

$se
[1] 0.06887319

mcse.q(x, 0.5)
$est
[1] 1.088475

$se
[1] 0.3440115

 > mcse.q(x, 0.1)
$est
[1] -3.481419

$se
[1] 0.06887319

Jim
#
Thanks Jim.

After reinstall of new R version all mentioned packages work. I tested various functions which revealed that on my lognorm data there is no big difference in error of median or 10% quantile. I also found some function for quantile se computing in Hmisc package.

Petr
#
Hi Bert
Partly. I asked also for pointing me to some R functions which can compute such standard error, which is by my humble opinion valid R question.
Hm. I looked at some web info, which is quite good for trained statistician but at the edge of my understanding as chemist (and sometimes beyound:-).

Thanks anyway.
Regards
Petr
#
Hi Ted
<snip>
So if I understand correctly p*(1-p) is biggest when p=0.5 and decreases with smaller or bigger p. The var(X.p) then depends on ratio to parent distribution at this p probability. For lognorm distribution and 200 values the resulting var is
[1] 3.125e-08
[1] 6.648497e-08

so 0.1 var is slightly bigger than 0.5 var. For different distributions this can be reversed as Jim pointed out.

Did I manage to understand?

Thank you very much.
Regards
Petr
#
[see in-line below]
On 31-Oct-2012 10:26:14 PIKAL Petr wrote:
Yes, it looks as though you understand! As a further illustration,
here is some R code applied to examples where the parent distrbution
is uniform or Normal. For each case, the reraults are stated as
first: simulated; second: by the formula. It can be seen that for
n=200 the formula and the simulations are close.
Ted.

###################################################################
## Test of formula for var(quantile)
varQ <- function(p,n,f.p) {
  p*(1-p)/(n*(f.p^2))
}

## Test 1: Uniform (0,1), n = 200
n <- 200
## Pick one of (a), (b), (c):
## (a)# p <- 0.50 ; q <- 100 ; f.p <- 1
## (b)# p <- 0.25 ; q <-  50 ; f.p <- 1
## (c)# p <- 0.10 ; q <-  25 ; f.p <- 1
Nsim <- 1000
Qs   <- numeric(Nsim)
for( i in (1:Nsim) ){
  Qs[i] <- sort(runif(n))[q]
}
var(Qs)
varQ(p,n,f.p)
## (a) 0.001239982
##     0.00125
## (b) 0.0008877879
##     0.0009375
## (c) 0.0005619348
##     0.00045

## Test 2: N(0,1), n = 200
n <- 200
## Pick one of (a), (b), (c):
## (a)# p <- 0.50 ; q <- 100 ; f.p <- dnorm(qnorm(0.50))
## (b)# p <- 0.25 ; q <-  50 ; f.p <- dnorm(qnorm(0.25))
## (c)# p <- 0.10 ; q <-  20 ; f.p <- dnorm(qnorm(0.10))
Nsim <- 1000
Qs   <- numeric(Nsim)
for( i in (1:Nsim) ){
  Qs[i] <- sort(rnorm(n))[q]
}
var(Qs)
varQ(p,n,f.p)
## (a) 0.007633568
##     0.007853982
## (b) 0.009370099
##     0.009283837
## (c) 0.01420517
##     0.01461055
###################################################################
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 31-Oct-2012  Time: 18:10:16
This message was sent by XFMail