Just for the fun of it, here are two more: by and ave.
with(basicSub, by(score, student, mean))
student: 1
[1] 55
------------------------------------------------------------
student: 2
[1] 60
------------------------------------------------------------
student: 3
[1] 67.5
Not my favorite print method; to return a vector, do instead
as.vector(with(basicSub, by(score, student, mean)))
[1] 55.0 60.0 67.5
You can cbind the unique student IDs to get a matrix result.
ave() is used to map the average (or comparable summary) to each
observation.
By itself, it returns a vector of the same length as the number of
observations:
with(basicSub, ave(score, student))
[1] 55.0 60.0 67.5 67.5 55.0
It's more useful if you want to add the means to the data frame:
transform(basicSub, avg = ave(score, student))
student gender score avg
1 1 m 50 55.0
2 2 m 60 60.0
3 3 f 70 67.5
4 3 f 65 67.5
5 1 m 60 55.0
That makes eight solutions. Any others? :)
Dennis
On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:
Here are 6 ways:
1. aggregate
aggregate(basicSub["score"], basicSub["student"], mean)
student score
1 1 55.0
2 2 60.0
3 3 67.5
2. tapply
with(basicSub, tapply(score, student, mean))
1 2 3
55.0 60.0 67.5
3. summaryBy in doBy package
library(doBy)
summaryBy(. ~ student, basicSub)
student score.mean
1 1 55.0
2 2 60.0
3 3 67.5
4. sqldf in sqldf package. Uses SQL:
library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
student avg(score)
1 1 55.0
2 2 60.0
3 3 67.5
5. summary.formula in Hmisc
summary(score ~ student, basicSub)
score N=5
+-------+-+-+-----+
| | |N|score|
+-------+-+-+-----+
|student|1|2|55.0 |
| |2|1|60.0 |
| |3|2|67.5 |
+-------+-+-+-----+
|Overall| |5|61.0 |
+-------+-+-+-----+
6. plyr (see Dennis Murphy's solution in this thread)
On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
<dhshanab at acad.umass.edu> wrote:
I want to use aggregate with the mean function on specific columns
gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student),
FUN=mean,
This doesn't work, one cannot take the mean of a factor (gender). Is
there any way of specifying which columns to use for the mean? I want to
aggregate by student, obtaining mean scores, and assume any other factors
are unchanging in a specific student, ie. gender.
Thanks
[[alternative HTML version deleted]]