Skip to content

Any way to apply TWO functions with tapply()?

8 messages · Phil Wieland, Tal Galili, Fredrik Karlsson +3 more

#
I need to compute the mean and the standard deviation of a data set and would
like to have the results in one table/data frame. I call tapply() two times
and do then merge the resulting tables to have them all in one table. Is
there any way to tell tapply() to use the functions mean and sd within one
function call? Something like tapply(data$response, list(data$targets,
data$conditions), c(mean, sd)).

Thanks in advance.
#
Hi,

What you can do is define your own function which takes a vector of
values, computes the statistics you want and then returns a string
which displays the output the way you want it. Then use this function
in your tapply call.

like (untested)

mySummary <- function(x) {

  paste(mean(x),sd(x),sep=",")

}

tapply(data$response, list(data$targets, data$conditions) ,mySummary)

Of course, if you need a different output format, then you'll have to
adapt the paste call.

/Fredrik
On Fri, May 7, 2010 at 11:39 AM, Phil Wieland <phwiel at gmx.de> wrote:

  
    
#
tapply does handle functions with vector outputs, e.g. using the built
in CO2 data set the following data frame is returned:
mean   sd
Quebec      33.5 9.67
Mississippi 20.9 7.82

Note that if you replace data.frame with c in f then you get a matrix
out instead of a data.frame.

There is also somewhat similar functionality in summaryBy (doBy
package), summary.formula (Hmisc package), ddply (plyr package), remix
(remix package), melt and cast (reshape package) and sqldf (sqldf
package).

Of these summaryBy in the doBy package is particularly easy to specify
and produces a data frame:

library(doBy)
summaryBy(uptake ~ Type, data = CO2, FUN = c(mean, sd))

remix and summary.formula in Hmisc have particularly attractive output
but do not produce data frames.  Hmisc even has a plot method.  The
specification to remix is also simple here and, in fact, is identical
to the summaryBy line above except it uses lower case fun.  sqldf uses
SQL for the specification which may be an advantage if you know SQL
better than R.
On Fri, May 7, 2010 at 5:39 AM, Phil Wieland <phwiel at gmx.de> wrote:
#
That was a superb answer to a question that has already appeared in  
various forms on r-help at least 4 times that I can remember just this  
week. I think your text could be appended without much editing into  
the FAQ.  It could then have its own hyperlink, and you wouldn't need  
to type it again the next umpteen times that it will re-appear.
#
Actually I have never mentioned the majority of the items in this post
in that form.  remix only appeared yesterday on CRAN and as pointed
out to me offline the binaries are still not there (but should be
automatically built shortly).
On Fri, May 7, 2010 at 9:49 AM, David Winsemius <dwinsemius at comcast.net> wrote:
#
As pointed out to me offline, data.table should be added to the list
of relevant packages as well.  Its primary advantage is for large data
sets as it is very fast.  Its interface does take some getting used to
but its most recent version on CRAN does have several vignettes which
should ease learning.

On Fri, May 7, 2010 at 8:50 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
#
Note that the development version of plyr is as fast as data.table for
many tasks.  But I don't want to realise it until I'm sure that I
haven't introduced any new bugs.

Hadley