Skip to content

Calculated mean value based on another column bin from dataframe.

7 messages · Fabrice Tourre, Henrique Dallazuanna, PIKAL Petr +1 more

#
Dear list,

I have a dataframe with two column as fellow.
V1      V2
 0.15624 0.94567
 0.26039 0.66442
 0.16629 0.97822
 0.23474 0.72079
 0.11037 0.83760
 0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
    fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
    m<-mean(dat[fil,2])
    mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class = "data.frame")
#
Try this:

fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:

  
    
#
Dear Henrique Dallazuanna,

Thank you very much for your suggestion.

It is obvious that your method is better than me.

Is it possible to use cut, table,by etc? Whether there is some
aggregate function in R can do this?

Thanks.
On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
#
Hi


r-help-bounces at r-project.org napsal dne 06.04.2011 10:48:04:
Do you want something like that?

#make data
x<-runif(100) 
y<-runif(100)

#cut first column to bins (in your case dat[,1] and ran)
x.c<-cut(x, seq(0,1,.1))

#aggregate column 2 according to bins (in your case dat[,2])
aggregate(y,list(x.c), mean)
     Group.1         x
1    (0,0.1] 0.5868734
2  (0.1,0.2] 0.5436263
3  (0.2,0.3] 0.5099366
4  (0.3,0.4] 0.4815855
5  (0.4,0.5] 0.4137687
6  (0.5,0.6] 0.4698156
7  (0.6,0.7] 0.4687639
8  (0.7,0.8] 0.5661048
9  (0.8,0.9] 0.5489297
10   (0.9,1] 0.4812521

Regards
Petr
"data.frame")
http://www.R-project.org/posting-guide.html
#
On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:

            
Here is how I would have done it with findInterval and tapply which is  
very similar to using a `cut` and `table` approach:

 > dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
 > tapply(dat$V2, dat$grp, mean)
         1         2         3         4         5         6         8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------

You do not get exactly the same form of the result as with Henrique's  
method. His yields:
 > mm
  [1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450  
0.7269900       NaN
  [8] 0.6978900       NaN       NaN       NaN

####----------------

The cut approach would yield this, which is more informatively  
labeled. (I'm wasn't completely sure the second to last word in the  
prior sentence was a real word, but several dictionaries seem to think  
so.):

 > dat$grp2 <- cut(dat$V1 , breaks=ran)
 > tapply(dat$V2, dat$grp2, mean)
   (0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
  0.9252300  0.8836100  0.9135429  0.9213600  0.8493450  0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
         NA  0.6978900         NA         NA
David Winsemius, MD
West Hartford, CT
#
This is extractly what I want. Thank you very much.
On Wed, Apr 6, 2011 at 5:22 PM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
#
Thanks. Before I never used findInterval function. It seems very nice.
On Wed, Apr 6, 2011 at 11:20 PM, David Winsemius <dwinsemius at comcast.net> wrote: