Percentiles/Quantiles with Weighting
Here is one kind of weighted quantile function.
The basic idea is very simple:
wquantile <- function( v, w, p )
{
v <- v[order(v)]
w <- w[order(v)]
v [ which.max( cumsum(w) / sum(w) >= p ) ]
}
With some more error-checking and general clean-up, it looks like this:
# Simple weighted quantile
#
# v A numeric vector of observations
# w A numeric vector of positive weights
# p The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.
# Basic idea
wquantile <- function( v, w, p )
{
v <- v[order(v)]
w <- w[order(v)]
v [ which.max( cumsum(w) / sum(w) >= p ) ]
}
# Simple weighted quantile
#
# v A numeric vector of observations
# w A numeric vector of positive weights
# p The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.
wquantile <- function(v,w=rep(1,length(v)),p=.5)
{
if (!is.numeric(v) || !is.numeric(w) || length(v) != length(w))
stop("Values and weights must be equal-length numeric vectors")
if ( !is.numeric(p) || any( p<0 | p>1 ) )
stop("Quantiles must be 0<=p<=1")
ranking <- order(v)
sumw <- cumsum(w[ranking])
if ( is.na(w[1]) || w[1]<0 ) stop("Weights must be non-negative numbers")
plist <- sumw/sumw[length(sumw)]
sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ])
}
I would appreciate any comments people have on this -- whether
correctness, efficiency, style, ....
-s
On Tue, Feb 17, 2009 at 11:57 AM, Brigid Mooney <bkmooney at gmail.com> wrote:
Hi All,
I am looking at applications of percentiles to time sequenced data. I had
just been using the quantile function to get percentiles over various
periods, but am more interested in if there is an accepted (and/or
R-implemented) method to apply weighting to the data so as to weigh recent
data more heavily.
I wrote the following function, but it seems quite inefficient, and not
really very flexible in its applications - so if anyone has any suggestions
on how to look at quantiles/percentiles within R while also using a
weighting schema, I would be very interested.
Note - this function supposes the data in X is time-sequenced, with the most
recent (and thus heaviest weighted) data at the end of the vector
WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
{
Xprime <- NA
for(i in 1:length(X))
{
Xprime <- c(Xprime, rep(X[i], times=i))
}
print("Percentiles:")
print(quantile(X, pctile))
print("Weighted:")
print(Xprime)
print("Weighted Percentiles:")
print(quantile(Xprime, pctile, na.rm=TRUE))
}
WtPercentile(1:10)
WtPercentile(rnorm(10))
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.