Calculated mean value based on another column bin from dataframe.
Thanks. Before I never used findInterval function. It seems very nice.
On Wed, Apr 6, 2011 at 11:20 PM, David Winsemius <dwinsemius at comcast.net> wrote:
On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:
Dear Henrique Dallazuanna, Thank you very much for your suggestion. It is obvious that your method is better than me. Is it possible to use cut, table,by etc? Whether there is some aggregate function in R can do this? Thanks. On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
Try this: fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) + 1)], '>=', e1 = dat[,1]) mm <- apply(fil, 2, function(idx)mean(dat[idx, 2])) On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:
Dear list, I have a dataframe with two column as fellow.
head(dat)
? ? ?V1 ? ? ?V2
?0.15624 0.94567
?0.26039 0.66442
?0.16629 0.97822
?0.23474 0.72079
?0.11037 0.83760
?0.14969 0.91312
I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?
dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
? fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
? m<-mean(dat[fil,2])
? mm<-c(mm,m)
}
mm
Here is the first 20 lines of my data.
dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
"data.frame")
______________________________________________
Here is how I would have done it with findInterval and tapply which is very similar to using a `cut` and `table` approach:
dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) ) tapply(dat$V2, dat$grp, mean)
? ? ? ?1 ? ? ? ? 2 ? ? ? ? 3 ? ? ? ? 4 ? ? ? ? 5 ? ? ? ? 6 ? ? ? ? 8 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900 #####--------------- You do not get exactly the same form of the result as with Henrique's method. His yields:
mm
?[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 ? ? ? NaN ?[8] 0.6978900 ? ? ? NaN ? ? ? NaN ? ? ? NaN ####---------------- The cut approach would yield this, which is more informatively labeled. (I'm wasn't completely sure the second to last word in the prior sentence was a real word, but several dictionaries seem to think so.):
dat$grp2 <- cut(dat$V1 , breaks=ran) tapply(dat$V2, dat$grp2, mean)
?(0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3] ?0.9252300 ?0.8836100 ?0.9135429 ?0.9213600 ?0.8493450 ?0.7269900 (0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5] ? ? ? ?NA ?0.6978900 ? ? ? ? NA ? ? ? ? NA
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.