Skip to content

create new variable: percentile value of variable in data frame

4 messages · Jonathan Beard, Stephan Kolassa, David Winsemius

#
Hello all,

Thanks in advance for you attention.
I would like to generate a third value that represents the quantile
value of a variable in a data frame.


# generating data

x <- as.matrix(seq(1:30))
y <- as.matrix(rnorm(30, 20, 7))
tmp1 <- cbind(x,y)
dat <- as.data.frame(tmp1)
colnames(dat) <- c("id", "score")
dat

#  finding percentiles of "score"

qs <- as.matrix(quantile(dat$score, type=3, probs = seq(0,1,.1)))
colnames(qs) <- c( "score")
qs

#  is there a way to put the quantile value for a value of 'score'
into a new variable,
#  such that the new data frame would have three variables: id, score
and q.score?

##  running R version 2.8.1 (2008-12-22) on Vista


Thanks so much!

-Jon
#
Hi Jon,

does the empirical cumulative distribution function do what you want?

dat$q.score <- ecdf(dat$score)(dat$score)
?ecdf

HTH
Stephan


Jonathan Beard schrieb:
1 day later
#
Hi Stephan, thanks for your response.

It looks like the ecdf() works like it should.

I have a quick follow-up:

I didn't notice any discussion in the help documents of the methods
behind ecdf() and quantile(type=3) being equivalent.

It looks like the results produced by each method are consistent.

Any thoughts?

Again, thanks so much,

-Jon
On Fri, May 28, 2010 at 4:06 PM, Stephan Kolassa <Stephan.Kolassa at gmx.de> wrote:
#
On May 30, 2010, at 9:03 AM, Jonathan Beard wrote:

            
If you want a method that uses what you know to be type 3 quantile  
based consider:

 > dat$q.score <- findInterval(dat$score, qs)
 > dat

You can adjust the parameters of findInterval to resolve to your  
specifications issues relating to which end of the interval is open.