Dear R people, I have a simple question to ask. Suppose I have a data.frame with two variables: one factor (x) and one numeric (y), I want to calculate the mean of y for each value of x. Although it's easy to do it within a for a loop, I believe there may be a concise way by using some kinds of "apply" functions. Could anyone tell me how to do that? Thank you. Frank
How to calculate the stratified means in a data frame?
4 messages · Frank Duan, Brian Ripley, Peter Dalgaard +1 more
On Thu, 18 Nov 2004, Frank Duan wrote:
I have a simple question to ask. Suppose I have a data.frame with two variables: one factor (x) and one numeric (y), I want to calculate the mean of y for each value of x. Although it's easy to do it within a for a loop, I believe there may be a concise way by using some kinds of "apply" functions. Could anyone tell me how to do that? Thank you.
tapply(y, x, mean) # which _is_ in `An Introduction to R', BTW ?by ?aggregate for more sophisticated packaging of such ideas.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Frank Duan <fhduan at gmail.com> writes:
Dear R people, I have a simple question to ask. Suppose I have a data.frame with two variables: one factor (x) and one numeric (y), I want to calculate the mean of y for each value of x. Although it's easy to do it within a for a loop, I believe there may be a concise way by using some kinds of "apply" functions. Could anyone tell me how to do that? Thank you.
tapply() will do that. (help(tapply), look at the "presidents" example).
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Thu, 2004-11-18 at 15:34 -0500, Frank Duan wrote:
Dear R people, I have a simple question to ask. Suppose I have a data.frame with two variables: one factor (x) and one numeric (y), I want to calculate the mean of y for each value of x. Although it's easy to do it within a for a loop, I believe there may be a concise way by using some kinds of "apply" functions. Could anyone tell me how to do that? Thank you. Frank
One way is to use by(). Using the 'iris' dataset to get the means for Sepal.Length by Species:
with(iris, by(Sepal.Length, Species, mean))
INDICES: setosa [1] 5.006 ------------------------------------------------------ INDICES: versicolor [1] 5.936 ------------------------------------------------------ INDICES: virginica [1] 6.588 See ?by, also ?tapply and ?aggregate. Note also the use of with() as a wrapper, in lieu of attach() here. HTH, Marc Schwartz