Best way to handle missing data?

Tue, Mar 3, 2015 2:16 AM

With MI, you do indeed average parameter estimates across the imputed datasets. And the way the SE for such an average is computed takes into consideration not only the variance of the estimate conditional on a particular dataset but also the variability across datasets. That's in fact the entire point of doing the imputation multiple times.

See, for example: http://sites.stat.psu.edu/~jls/mifaq.html#howto

One can apply that principle to any parameter estimate, even if this computation is not automated for particular models via a package.

Best,
Wolfgang

-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
project.org] On Behalf Of Joseph Bulbulia
Sent: Monday, March 02, 2015 13:04
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Best way to handle missing data?

RELATED QUESTION
I have a related and probably naive question, but raising it might be
helpful to Bonnie and others (myself included) who are struggling with
multiple-imputaton in a mixed-effects modeling setting.

FIRST, MY DISCOMFORT
The question arises from (1) my discomfort with averaging across multiply
imputed datasets, which seems to lose the uncertainty from the data-
generating imputation process (2) my need to use a wider class of models
than is made available by Zelig ? such as MCMCglmm.

NOTE
I realise that MCMCglmm can handle missing variables (MAR) as outcome
variables,  but where many columns have missing values, the resulting
multivariate outcome model will often becomes overly complex.

THE QUESTION
To avoid averaging, if multiple data sets were generated (assume
sensibly) through a multiple imputation algorithm (say using the Amelia
package), would it make any sense to combine the datasets (e.g. using r-
bind) with an indicator for each of the imputed datasets, and then to
model each specific imputed dataset as a random effect in, say,
MCMCglmm?

REASONING
If the observations from the datasets were conceived as measurements on
individuals (also included as an effect modelled as random).  Then
conceptually it seems you would be adjusting your expectation for the
variation of multiple observations within individuals from the multiply
imputed datasets. Where there is no imputation, the observed values
remain constant, and part of me thinks this constancy of observations
within individuals shouldn?t effect the estimates... I think?

SNAG
On the other hand, just combining datasets with an indicator for each
dataset would artificially (and often dramatically) increase the number
of observations, which might not be handled adequately by the G/ R
structures.

APOLOGY
I apologise if this question makes little sense, or if the answer is just
plain obvious.  I?d intended to ask a statistician at work, and to
simulate some data with him,  but the topic came up here, and I figured
others might benefit, in case others had the same (potentially naive)
thought, and the experts have a quick answer, even if the answer is ?you
are muddled.?

Cheers,

Joseph

Best way to handle missing data?

Thread (4 messages)