Skip to content

Help with aggregate syntax for a multi-column function please.

3 messages · Michael Karol, Jean V Adams, Dennis Murphy

#
Hi:

Another way to do this is to use one of the summarization packages.
The following uses the plyr package.

The first step is to create a function that takes a data frame as
input and outputs either a data frame or a scalar. In this case, the
function returns a scalar, but if you want to carry along additional
variables in the output, you can replace it with a data frame that
returns the set of variables you want. You don't need to return the
grouping variables, but no harm is done if you do.

# This assumes the existence of a function AUC with the arguments
#  you stated in your post. I presume it returns a scalar value; if not,
# you should modify it to return a data frame instead. It would probably
# be better to modify AUC and call it in ddply() directly, but without the
# function code there's not much one can do...
myAUC <- function(df)
   AUC(df, 'TimeBestEstimate', 'Pt','ConcentrationBQLzero')

library('plyr')
ddply(PKdata, .(Cycle, DoseDayNominal, Drug), myAUC)

This is obviously untested, so caveat emptor. Both plyr and data.table
can accept functions with multiple arguments and do the right thing.
The trick in plyr is to write a function that takes a generic input
object (e.g., a (sub)data frame) and then uses (the variables within)
it to do the necessary calculations. Generally, you want the output of
the function to be compatible with the type of output you want from
the **ply() function. In this case, ddply() means data frame input,
data frame output; alply() would mean array input and list output,
etc.

If this doesn't work, please provide a reproducible example.

HTH,
Dennis
On Tue, Aug 2, 2011 at 7:32 AM, Michael Karol <MKarol at syntapharma.com> wrote: