Averaging over data sets

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120112/1777c4d0/attachment.pl>
Hi,

I might write a little function that does different things depending
on the class of the variable.  Along the lines of:

where i is a column index:

function(i) {
if (is.numeric(imputeddata[, i])) {
  something
} else if (is.factor(imputeddata[, i])) {
  something else
} etc.

then you can just do:

combined <- lapply(1:nrow(imputeddata), yourfun)

Alternately, you could consider some single imputation approaches
since that is what you essentially end up doing.

Cheers,

Josh
Hi all,

after using Amelia II to create 10 imputed data sets I need to average them
to have one unique data that includes the average for each cell of the
variables imputed, in addition to the values for the variables not imputed.
Such data has many variables (some numeric, other factors), and more than
20000 observations. I do not know how to average them out. Any help?

Below I provide a small example:

Suppose Amelia provided two datasets:

d1 <- data.frame(subject = c("Felipe", "John"), eat1 = 1:2, eat3 = 5:6, trt
= c("t1", "t2"))

d2 <- data.frame(subject = c("Felipe", "John"), eat1 = 3:4, eat3 = 6:7, trt
= c("t1", "t2"))

I tried

(d1 + d2)/2

but I lose my factors. mean() did not work either.

The result I'd like is:

? ? subject ?eat1 ?eat3 ? trt
1 ? Felipe ? ? 2 ? ? ?5.5 ? ? t1
2 ? ? John ? ? ?3 ? ? ?6.5 ? ? t2

thanks,

*Felipe Nunes*
CAPES/Fulbright Fellow
PhD Student Political Science - UCLA
Web: felipenunes.bol.ucla.edu

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
Here is a solution that works for your small example.
It might be difficult to prepare your larger data sets to use the same
method.

db <-rbind(d1,d2)
aggregate(subset(db,select=-c(subject,trt)),
by=list(subject=db$subject),mean)
## or, for example,
aggregate(subset(db,select=-c(subject,trt)), by=list(subject=db$subject,
trt=db$trt),mean)

In order for aggregate() to work, its first argument must have only
numeric columns. That is what
subset(db,select=-c(subject,trt)) does for you.

(d1 + d2)/2 did not work because d1 and d2 are data frames, not numbers.
Much more complicated, you could have done your averages one at a time,
  (d1$eat1[d1$subject=='Felipe'] + d2$eat1[d2$subjedt=='Felipe'])/2
and similarly for eat3 and John. But that is of course not practical for
larger data sets.

-Don
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 1/12/12 10:16 PM, "Felipe Nunes" <felipnunes at gmail.com> wrote:

>Hi all,
>
>after using Amelia II to create 10 imputed data sets I need to average
>them
>to have one unique data that includes the average for each cell of the
>variables imputed, in addition to the values for the variables not
>imputed.
>Such data has many variables (some numeric, other factors), and more than
>20000 observations. I do not know how to average them out. Any help?
>
>Below I provide a small example:
>
>Suppose Amelia provided two datasets:
>
>d1 <- data.frame(subject = c("Felipe", "John"), eat1 = 1:2, eat3 = 5:6,
>trt
>= c("t1", "t2"))
>
>d2 <- data.frame(subject = c("Felipe", "John"), eat1 = 3:4, eat3 = 6:7,
>trt
>= c("t1", "t2"))
>
>I tried
>
>(d1 + d2)/2
>
>but I lose my factors. mean() did not work either.
>
>The result I'd like is:
>
>     subject  eat1  eat3   trt
>1   Felipe     2      5.5     t1
>2     John      3      6.5     t2
>
>thanks,
>
>*Felipe Nunes*
>CAPES/Fulbright Fellow
>PhD Student Political Science - UCLA
>Web: felipenunes.bol.ucla.edu
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.