Skip to content

group means

6 messages · Jeremy Z Butler, Sundar Dorai-Raj, J.R. Lockwood +3 more

#
Hi,
Any hints on how I would generate the means of each 5 number group in a 
column of numbers in data.frame form. i.e. get mean of first five in column 
and then mean of second five in column etc. etc.

1   3.4
2   6.0
3   2.5
4   7.5
5   1.8
6   4.2
7   6.4
8   5.7
9   17.2
10  13.5

Grateful for any suggestions
Jeremy
#
Jeremy Z Butler wrote:
See ?running in package:gregmisc.

Sundar
#
One way to do what you want is to create a grouping variable and add
it to your data frame as a factor, and then use tapply. e.g., assuming
your dataframe "d" with column "x" has a number of rows divisible by
5, use

d$grp<-gl(dim(d)[1]/5,5)
tapply(d$x,d$grp,"mean")

J.R. Lockwood
412-683-2300 x4941
lockwood at rand.org
http://www.rand.org/methodology/stat/members/lockwood/
#
You could try using `aggregate', e.g.

df <- data.frame(a = rnorm(10, 1), b = rnorm(10, 2))
grps <- rep(1:2, each = 5)
aggregate(df, list(grps), mean)

-roger
_______________________________
UCLA Department of Statistics
rpeng at stat.ucla.edu
http://www.stat.ucla.edu/~rpeng
On Thu, 20 Feb 2003, Sundar Dorai-Raj wrote:

            
#
When all groups have the same number of elements and the groups are
consecutive I normally transform the vector into a matrix where each
column contains data from one group. Then I perform whatever on each
group using apply():

x <- c(3.4, 6.0, 2.5, 7.5, 1.8, 4.2, 6.4, 5.7, 17.2, 13.5)
xm <- matrix(x, nrow=5)   # matrix() "fills by column" by default
print(xm)
#      [,1] [,2]
# [1,]  3.4  4.2
# [2,]  6.0  6.4
# [3,]  2.5  5.7
# [4,]  7.5 17.2
# [5,]  1.8 13.5
m <- apply(xm, MARGIN=2, FUN=mean, na.rm=TRUE) # MARGIN=2 means "along
columns" or "columnwise"
print(m) 
# [1] 4.24 9.40

Hope this helps

Henrik Bengtsson

Home: 201/445 Royal Parade, 3052 Parkville
Office: Bioinformatics, WEHI, Parkville
+61 (0)412 269 734 (cell), +61 (0)3 9345 2324 (lab),
+1 (508) 464 6644 (global fax)
hb at wehi.edu.au, http://www.maths.lth.se/~hb/
Time zone: +11h UTC (Sweden +1h UTC, Calif. -8h UTC)
#
How about:

 > group.means <- function(x, k=5){
+  n.gps <- floor(length(x)/k)
+  rep(1, k) %*% array(x[1:(k*n.gps)], dim=c(k, n.gps))
+ }
 > group.means(c(3.4, 6.0, 2.5, 7.5, 1.8, 4.2, 6.4, 5.7, 17.2, 13.5))
      [,1] [,2]
[1,] 21.2   47

Best Wishes,
Spencer Graves
Henrik Bengtsson wrote: