function in aggregate applied to specific columns only
Here are 6 ways: 1. aggregate
aggregate(basicSub["score"], basicSub["student"], mean)
student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply
with(basicSub, tapply(score, student, mean))
1 2 3 55.0 60.0 67.5 3. summaryBy in doBy package
library(doBy) summaryBy(. ~ student, basicSub)
student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL:
library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc
summary(score ~ student, basicSub)
score N=5 +-------+-+-+-----+ | | |N|score| +-------+-+-+-----+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +-------+-+-+-----+ |Overall| |5|61.0 | +-------+-+-+-----+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
<dhshanab at acad.umass.edu> wrote:
I want to use aggregate with the mean function on specific columns
gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE)
This doesn't work, one cannot take the mean of a factor (gender). ?Is there any way of specifying which columns to use for the mean? ?I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.
Thanks
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.