Message-ID: <alpine.DEB.2.00.1001052006330.7609@paninaro.stat-math.wu-wien.ac.at>
Date: 2010-01-05T19:09:42Z
From: Achim Zeileis
Subject: mean for subset
In-Reply-To: <4b5c27441001051029l298e7438q5d31dc1b37a21b29@mail.gmail.com>
On Tue, 5 Jan 2010, Geoffrey Smith wrote:
> Hello, does anyone know how to take the mean for a subset of observations?
> For example, suppose my data looks like this:
>
> OBS NAME SCORE
> 1 Tom 92
> 2 Tom 88
> 3 Tom 56
> 4 James 85
> 5 James 75
> 6 James 32
> 7 Dawn 56
> 8 Dawn 91
> 9 Clara 95
> 10 Clara 84
>
> Is there a way to get the mean of the SCORE variable by NAME but only when
> the number of observations is equal to 3? In other words, is there a way to
> get the mean of the SCORE variable for Tom and James, but not for Dawn and
> Clara? Thank you.
You can use tapply() together with a custom function that returns NA if
the condition is not satisfied, e.g.
## read data
dat <- read.table(textConnection("
OBS NAME SCORE
1 Tom 92
2 Tom 88
3 Tom 56
4 James 85
5 James 75
6 James 32
7 Dawn 56
8 Dawn 91
9 Clara 95
10 Clara 84
"), header = TRUE)
## use tapply() with custom function
with(dat,
tapply(SCORE, NAME, function(x) if(length(x) == 3) mean(x) else NA)
)
Alternatively you could look at
mymean <- with(dat, tapply(SCORE, NAME, mean))
mylength <- with(dat, tapply(SCORE, NAME, length))
mymean[mylength == 3]
etc.
hth,
Z
> --
> Geoffrey Smith
> Visiting Assistant Professor
> Department of Finance
> W. P. Carey School of Business
> Arizona State University
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>