Dear list,
I have a vector (array, table row, whatever is best) of frequency values
for categories (or bins), and I need to find the median category.
Trivial to do by hand, but I was wondering if there is a means to do it
in R in an elegant way.
The obvious medioan(vector) returns the median frequency for the binns,
and that is not what I want. i.e,:
freq
cat1 1
cat2 10
cat3 100
cat4 1000
cat5 10000
I want it to return cat5, instead of cat3.
Thanks a lot
Martin
median of binned values
5 messages · Martin Tomko, Chuck Cleland, Moshe Olshansky
Martin Tomko wrote:
Dear list,
I have a vector (array, table row, whatever is best) of frequency values
for categories (or bins), and I need to find the median category.
Trivial to do by hand, but I was wondering if there is a means to do it
in R in an elegant way.
The obvious medioan(vector) returns the median frequency for the binns,
and that is not what I want. i.e,:
freq
cat1 1
cat2 10
cat3 100
cat4 1000
cat5 10000
I want it to return cat5, instead of cat3.
df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")),
freq = c(1,10,100,1000,10000))
df
binname freq
1 cat1 1
2 cat2 10
3 cat3 100
4 cat4 1000
5 cat5 10000
with(df, levels(binname)[median(rep(as.numeric(binname), freq))])
[1] "cat5"
Thanks a lot Martin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Thank you, Chuck, would you mind commenting a bit on the code, it is not all clear... HOw would you go to retrieve only the numeric value (not the category name)? I am just starting with R, and the functionality of replicate and levels is not quite clear. I tried the documentation, but am not any wiser. What if I had a vector v <- vector(c(1,10,100,1000,10000)) and wanted to perform it on that? Thanks a lot Martin
Chuck Cleland wrote:
Martin Tomko wrote:
Dear list,
I have a vector (array, table row, whatever is best) of frequency values
for categories (or bins), and I need to find the median category.
Trivial to do by hand, but I was wondering if there is a means to do it
in R in an elegant way.
The obvious medioan(vector) returns the median frequency for the binns,
and that is not what I want. i.e,:
freq
cat1 1
cat2 10
cat3 100
cat4 1000
cat5 10000
I want it to return cat5, instead of cat3.
df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")),
freq = c(1,10,100,1000,10000))
df
binname freq
1 cat1 1
2 cat2 10
3 cat3 100
4 cat4 1000
5 cat5 10000
with(df, levels(binname)[median(rep(as.numeric(binname), freq))])
[1] "cat5"
Thanks a lot Martin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Martin Tomko Postdoctoral Research Assistant Geographic Information Systems Division Department of Geography University of Zurich - Irchel Winterthurerstr. 190 CH-8057 Zurich, Switzerland email: martin.tomko at geo.uzh.ch site: http://www.geo.uzh.ch/~mtomko mob: +41-788 629 558 tel: +41-44-6355256 fax: +41-44-6356848
Martin Tomko wrote:
Thank you, Chuck, would you mind commenting a bit on the code, it is not all clear... HOw would you go to retrieve only the numeric value (not the category name)? I am just starting with R, and the functionality of replicate and levels is not quite clear. I tried the documentation, but am not any wiser. What if I had a vector v <- vector(c(1,10,100,1000,10000)) and wanted to perform it on that? Thanks a lot Martin
Retrieve the numeric value rather than the category name as follows: with(df, freq[median(rep(as.numeric(binname), freq))]) [1] 10000 To do essentially the same thing with a vector: myvec <- c(1,10,100,1000,10000) myvec[median(rep(1:length(myvec), myvec))] [1] 10000 I'm sure I cannot explain levels() and rep() any better than the help pages for those functions.
Chuck Cleland wrote:
Martin Tomko wrote:
Dear list,
I have a vector (array, table row, whatever is best) of frequency values
for categories (or bins), and I need to find the median category.
Trivial to do by hand, but I was wondering if there is a means to do it
in R in an elegant way.
The obvious medioan(vector) returns the median frequency for the binns,
and that is not what I want. i.e,:
freq
cat1 1
cat2 10
cat3 100
cat4 1000
cat5 10000
I want it to return cat5, instead of cat3.
df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")),
freq = c(1,10,100,1000,10000))
df
binname freq
1 cat1 1
2 cat2 10
3 cat3 100
4 cat4 1000
5 cat5 10000
with(df, levels(binname)[median(rep(as.numeric(binname), freq))])
[1] "cat5"
Thanks a lot Martin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Alternatively levels(df$binname)[which(df$freq >= 0.5*cumsum(df$freq)[nrow(df)])[1]]
--- Chuck Cleland <ccleland at optonline.net> wrote:
Martin Tomko wrote:
Dear list, I have a vector (array, table row, whatever is
best) of frequency values
for categories (or bins), and I need to find the
median category.
Trivial to do by hand, but I was wondering if
there is a means to do it
in R in an elegant way. The obvious medioan(vector) returns the median
frequency for the binns,
and that is not what I want. i.e,:
freq
cat1 1
cat2 10
cat3 100
cat4 1000
cat5 10000
I want it to return cat5, instead of cat3.
df <- data.frame(binname = as.factor(paste("cat",
1:5, sep="")),
freq = c(1,10,100,1000,10000))
df
binname freq
1 cat1 1
2 cat2 10
3 cat3 100
4 cat4 1000
5 cat5 10000
with(df,
levels(binname)[median(rep(as.numeric(binname),
freq))])
[1] "cat5"
Thanks a lot Martin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.