function in aggregate applied to specific columns only

Here are 6 ways:

1. aggregate
aggregate(basicSub["score"], basicSub["student"], mean)
student score
1       1  55.0
2       2  60.0
3       3  67.5

2. tapply
with(basicSub, tapply(score, student, mean))
1    2    3
55.0 60.0 67.5

3. summaryBy in doBy package
library(doBy)
summaryBy(. ~ student, basicSub)
student score.mean
1       1       55.0
2       2       60.0
3       3       67.5

4. sqldf in sqldf package.  Uses SQL:
library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
student avg(score)
1       1       55.0
2       2       60.0
3       3       67.5

5. summary.formula in Hmisc
summary(score ~ student, basicSub)
score    N=5

+-------+-+-+-----+
|       | |N|score|
+-------+-+-+-----+
|student|1|2|55.0 |
|       |2|1|60.0 |
|       |3|2|67.5 |
+-------+-+-+-----+
|Overall| |5|61.0 |
+-------+-+-+-----+

6. plyr (see Dennis Murphy's solution in this thread)

On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
I want to use aggregate with the mean function on specific columns

gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE)

This doesn't work, one cannot take the mean of a factor (gender). ?Is there any way of specifying which columns to use for the mean? ?I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.

Thanks
? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

function in aggregate applied to specific columns only

Thread (7 messages)