Skip to content

median of binned values

5 messages · Martin Tomko, Chuck Cleland, Moshe Olshansky

#
Dear list,
I have a vector (array, table row, whatever is best) of frequency values 
for categories (or bins), and I need to find the median category. 
Trivial to do by hand, but I was wondering if there is a means to  do it 
in R in an elegant way.

The obvious medioan(vector) returns the median frequency for the binns, 
and that is not what I want. i.e,:
             freq
cat1    1
cat2   10  
cat3   100  
cat4   1000
cat5   10000

I want it to return cat5, instead of cat3.

Thanks a lot
Martin
#
Martin Tomko wrote:
df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")),
                 freq = c(1,10,100,1000,10000))

df
  binname  freq
1    cat1     1
2    cat2    10
3    cat3   100
4    cat4  1000
5    cat5 10000

with(df, levels(binname)[median(rep(as.numeric(binname), freq))])
[1] "cat5"

  
    
#
Thank you, Chuck,
would you mind commenting a bit on the code, it is not all clear... HOw 
would you go to retrieve only the numeric value (not the category name)?
I am just starting with R, and the functionality of replicate and levels 
is not quite clear. I tried the documentation, but am not any wiser. 
What if I had a vector v <- vector(c(1,10,100,1000,10000)) and wanted to 
perform it on that?

Thanks a lot
Martin
Chuck Cleland wrote:

  
    
#
Martin Tomko wrote:
Retrieve the numeric value rather than the category name as follows:

with(df, freq[median(rep(as.numeric(binname), freq))])
[1] 10000

  To do essentially the same thing with a vector:

myvec <- c(1,10,100,1000,10000)

myvec[median(rep(1:length(myvec), myvec))]
[1] 10000

  I'm sure I cannot explain levels() and rep() any better than the help
pages for those functions.

  
    
#
Alternatively
levels(df$binname)[which(df$freq >=
0.5*cumsum(df$freq)[nrow(df)])[1]]
--- Chuck Cleland <ccleland at optonline.net> wrote: