Repeated analysis over groups / Splitting by group variable
I would change that first dataOnly in by(...) or lapply(...) to dataOnly[,-3]. In fact, if the dataframe mydata is suitably subset, then, because of the as.matrix() in function(x), both the by() and lapply() methods will work fine with mydata. -Peter Ehlers
On 2010-07-15 15:42, Phil Spector wrote:
Ralf - If you want to use by(), I think it should look like this: by(dataOnly,dataOnly[,3],function(x)KLdiv(as.matrix(x))) But you might find the following more useful: lapply(split(as.data.frame(dataOnly),dataOnly[,3]), function(x)KLdiv(as.matrix(x))) since it returns its results in a list. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 15 Jul 2010, Ralf B wrote:
I am performing some analysis over a large data frame and would like
to conduct repeated analysis over grouped-up subsets. How can I do
that?
Here some example code for clarification:
require("flexmix") # for Kullback-Leibler divergence
n <- 23
groups <- c(1,2,3)
mydata <- data.frame(
sequence=c(1:n),
data1=c(rnorm(n)),
data2=c(rnorm(n)),
group=rep(sample(groups, n, replace=TRUE))
)
# Part 1: full stats (works fine)
dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group)
KLdiv(dataOnly)
#
# Part 2: again - but once for each group (error)
#
by(dataOnly, groups, KLdiv(dataOnly))
The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1,
2, 3)), function (x) :
arguments must have same length
Are there better ways than 'by' ? I would like to use different stats
and functions and therefore I am looking for a splitter whose output I
can hand to any statical function I want.
Any ideas?
Ralf