An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/9a5d1523/attachment.pl>
function in aggregate applied to specific columns only
7 messages · david hilton shanabrook, David Winsemius, milton ruser +3 more
On Jan 3, 2010, at 10:46 PM, david hilton shanabrook wrote:
I want to use aggregate with the mean function on specific columns
gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student),
FUN=mean, na.rm=TRUE)
> basicSubMean <- aggregate(basicSub$score, by=list(basicSub $student), FUN=mean, na.rm=TRUE) > basicSubMean Group.1 x 1 1 55.0 2 2 60.0 3 3 67.5
This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]]
David Winsemius, MD Heritage Laboratories West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/ea42df93/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/24f85944/attachment.pl>
Here are 6 ways: 1. aggregate
aggregate(basicSub["score"], basicSub["student"], mean)
student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply
with(basicSub, tapply(score, student, mean))
1 2 3 55.0 60.0 67.5 3. summaryBy in doBy package
library(doBy) summaryBy(. ~ student, basicSub)
student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL:
library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc
summary(score ~ student, basicSub)
score N=5 +-------+-+-+-----+ | | |N|score| +-------+-+-+-----+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +-------+-+-+-----+ |Overall| |5|61.0 | +-------+-+-+-----+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
<dhshanab at acad.umass.edu> wrote:
I want to use aggregate with the mean function on specific columns
gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE)
This doesn't work, one cannot take the mean of a factor (gender). ?Is there any way of specifying which columns to use for the mean? ?I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.
Thanks
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/d462668a/attachment.pl>
That makes eight solutions. Any others? :)
A ninth was detailed in two other threads last month. The first link compares to ave(). http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html "Dennis Murphy" <djmuser at gmail.com> wrote in message news:9a8a6c631001032057qc5cd68j9ec3882043dec0bc at mail.gmail.com...
Just for the fun of it, here are two more: by and ave.
with(basicSub, by(score, student, mean))
student: 1 [1] 55 ------------------------------------------------------------ student: 2 [1] 60 ------------------------------------------------------------ student: 3 [1] 67.5 Not my favorite print method; to return a vector, do instead
as.vector(with(basicSub, by(score, student, mean)))
[1] 55.0 60.0 67.5 You can cbind the unique student IDs to get a matrix result. ave() is used to map the average (or comparable summary) to each observation. By itself, it returns a vector of the same length as the number of observations:
with(basicSub, ave(score, student))
[1] 55.0 60.0 67.5 67.5 55.0 It's more useful if you want to add the means to the data frame:
transform(basicSub, avg = ave(score, student))
student gender score avg 1 1 m 50 55.0 2 2 m 60 60.0 3 3 f 70 67.5 4 3 f 65 67.5 5 1 m 60 55.0 That makes eight solutions. Any others? :) Dennis On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck <ggrothendieck at gmail.com>wrote:
Here are 6 ways: 1. aggregate
aggregate(basicSub["score"], basicSub["student"], mean)
student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply
with(basicSub, tapply(score, student, mean))
1 2 3 55.0 60.0 67.5 3. summaryBy in doBy package
library(doBy) summaryBy(. ~ student, basicSub)
student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL:
library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc
summary(score ~ student, basicSub)
score N=5 +-------+-+-+-----+ | | |N|score| +-------+-+-+-----+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +-------+-+-+-----+ |Overall| |5|61.0 | +-------+-+-+-----+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook <dhshanab at acad.umass.edu> wrote:
I want to use aggregate with the mean function on specific columns
gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student),
FUN=mean,
na.rm=TRUE)
This doesn't work, one cannot take the mean of a factor (gender). Is
there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.
Thanks
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]