Skip to content

Averaging over data sets

3 messages · Felipe Nunes, Joshua Wiley, MacQueen, Don

#
Hi,

I might write a little function that does different things depending
on the class of the variable.  Along the lines of:

where i is a column index:

function(i) {
if (is.numeric(imputeddata[, i])) {
  something
} else if (is.factor(imputeddata[, i])) {
  something else
} etc.

then you can just do:

combined <- lapply(1:nrow(imputeddata), yourfun)

Alternately, you could consider some single imputation approaches
since that is what you essentially end up doing.

Cheers,

Josh
On Thu, Jan 12, 2012 at 10:16 PM, Felipe Nunes <felipnunes at gmail.com> wrote:

  
    
#
Here is a solution that works for your small example.
It might be difficult to prepare your larger data sets to use the same
method.

db <-rbind(d1,d2)
aggregate(subset(db,select=-c(subject,trt)),
by=list(subject=db$subject),mean)
## or, for example,
aggregate(subset(db,select=-c(subject,trt)), by=list(subject=db$subject,
trt=db$trt),mean)

In order for aggregate() to work, its first argument must have only
numeric columns. That is what
subset(db,select=-c(subject,trt)) does for you.

(d1 + d2)/2 did not work because d1 and d2 are data frames, not numbers.
Much more complicated, you could have done your averages one at a time,
  (d1$eat1[d1$subject=='Felipe'] + d2$eat1[d2$subjedt=='Felipe'])/2
and similarly for eat3 and John. But that is of course not practical for
larger data sets.

-Don