function in aggregate applied to specific columns only

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/9a5d1523/attachment.pl>

I want to use aggregate with the mean function on specific columns

gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student),  
FUN=mean, na.rm=TRUE)
> basicSubMean <- aggregate(basicSub$score, by=list(basicSub 
$student), FUN=mean, na.rm=TRUE)
 > basicSubMean
   Group.1    x
1       1 55.0
2       2 60.0
3       3 67.5
This doesn't work, one cannot take the mean of a factor (gender).   
Is there any way of specifying which columns to use for the mean?  I  
want to aggregate by student, obtaining mean scores, and assume any  
other factors are unchanging in a specific student, ie. gender.

Thanks
	[[alternative HTML version deleted]]
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/ea42df93/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/24f85944/attachment.pl>
Here are 6 ways:

1. aggregate
aggregate(basicSub["score"], basicSub["student"], mean)
student score
1       1  55.0
2       2  60.0
3       3  67.5

2. tapply
with(basicSub, tapply(score, student, mean))
1    2    3
55.0 60.0 67.5

3. summaryBy in doBy package
library(doBy)
summaryBy(. ~ student, basicSub)
student score.mean
1       1       55.0
2       2       60.0
3       3       67.5

4. sqldf in sqldf package.  Uses SQL:
library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
student avg(score)
1       1       55.0
2       2       60.0
3       3       67.5

5. summary.formula in Hmisc
summary(score ~ student, basicSub)
score    N=5

+-------+-+-+-----+
|       | |N|score|
+-------+-+-+-----+
|student|1|2|55.0 |
|       |2|1|60.0 |
|       |3|2|67.5 |
+-------+-+-+-----+
|Overall| |5|61.0 |
+-------+-+-+-----+

6. plyr (see Dennis Murphy's solution in this thread)

On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
I want to use aggregate with the mean function on specific columns

gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE)

This doesn't work, one cannot take the mean of a factor (gender). ?Is there any way of specifying which columns to use for the mean? ?I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.

Thanks
? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100103/d462668a/attachment.pl>
That makes eight solutions. Any others?  :)
A ninth was detailed in two other threads last month. The first link 
compares to ave().
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html

"Dennis Murphy" <djmuser at gmail.com> wrote in message 
news:9a8a6c631001032057qc5cd68j9ec3882043dec0bc at mail.gmail.com...
Just for the fun of it, here are two more: by and ave.

with(basicSub, by(score, student, mean))
student: 1
[1] 55
------------------------------------------------------------
student: 2
[1] 60
------------------------------------------------------------
student: 3
[1] 67.5

Not my favorite print method;  to return a vector, do instead
as.vector(with(basicSub, by(score, student, mean)))
[1] 55.0 60.0 67.5
You can cbind the unique student IDs to get a matrix result.

ave() is used to map the average (or comparable summary) to each
observation.
By itself, it returns a vector of the same length as the number of
observations:
with(basicSub, ave(score, student))
[1] 55.0 60.0 67.5 67.5 55.0

It's more useful if you want to add the means to the data frame:
transform(basicSub, avg = ave(score, student))
 student gender score  avg
1       1      m    50 55.0
2       2      m    60 60.0
3       3      f    70 67.5
4       3      f    65 67.5
5       1      m    60 55.0

That makes eight solutions. Any others?  :)

Dennis

On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

Here are 6 ways:

1. aggregate

aggregate(basicSub["score"], basicSub["student"], mean)
 student score
1       1  55.0
2       2  60.0
3       3  67.5

2. tapply

with(basicSub, tapply(score, student, mean))
  1    2    3
55.0 60.0 67.5

3. summaryBy in doBy package

library(doBy)
summaryBy(. ~ student, basicSub)
 student score.mean
1       1       55.0
2       2       60.0
3       3       67.5

4. sqldf in sqldf package.  Uses SQL:

library(sqldf)
sqldf("select student, avg(score) from basicSub group by student")
 student avg(score)
1       1       55.0
2       2       60.0
3       3       67.5

5. summary.formula in Hmisc

summary(score ~ student, basicSub)
score    N=5

+-------+-+-+-----+
|       | |N|score|
+-------+-+-+-----+
|student|1|2|55.0 |
|       |2|1|60.0 |
|       |3|2|67.5 |
+-------+-+-+-----+
|Overall| |5|61.0 |
+-------+-+-+-----+

6. plyr (see Dennis Murphy's solution in this thread)

On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
<dhshanab at acad.umass.edu> wrote:
I want to use aggregate with the mean function on specific columns

gender <- factor(c("m", "m", "f", "f", "m"))
student <- c(0001, 0002, 0003, 0003, 0001)
score <- c(50, 60, 70, 65, 60)
basicSub <- data.frame(student, gender, score)
basicSubMean <- aggregate(basicSub, by=list(basicSub$student), 
FUN=mean,
na.rm=TRUE)
This doesn't work, one cannot take the mean of a factor (gender).  Is
there any way of specifying which columns to use for the mean?  I want to
aggregate by student, obtaining mean scores, and assume any other factors
are unchanging in a specific student, ie. gender.
Thanks
       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]