Skip to content
Prev 256000 / 398506 Next

Calculated mean value based on another column bin from dataframe.

On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:

            
Here is how I would have done it with findInterval and tapply which is  
very similar to using a `cut` and `table` approach:

 > dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
 > tapply(dat$V2, dat$grp, mean)
         1         2         3         4         5         6         8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------

You do not get exactly the same form of the result as with Henrique's  
method. His yields:
 > mm
  [1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450  
0.7269900       NaN
  [8] 0.6978900       NaN       NaN       NaN

####----------------

The cut approach would yield this, which is more informatively  
labeled. (I'm wasn't completely sure the second to last word in the  
prior sentence was a real word, but several dictionaries seem to think  
so.):

 > dat$grp2 <- cut(dat$V1 , breaks=ran)
 > tapply(dat$V2, dat$grp2, mean)
   (0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
  0.9252300  0.8836100  0.9135429  0.9213600  0.8493450  0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
         NA  0.6978900         NA         NA
David Winsemius, MD
West Hartford, CT