Best way to handle missing data?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
mice will impute the complete dataset, it just needs to have an imputation
method setup for each variable. See the example given in the help for
mice.impute.2lonly.norm

Full information maximum likelihood estimation (FIML) (Note for Landon,
this is ML taking into account the missing data) is only feasible if you
can reformulate everything as a structural equation model and use software
that can cope with this. Otherwise working with the integrals is pretty
much impossible. If there is something in the model that is nonlinear it
probably isn't an option at all. One of the great things about multiple
imputation is that you get it running with say 20 imputations and then run
it overnight with 200 or more and it probably won't change but you will
know that you have enough imputations. So FIML doesn't have an advantage in
that respect.

I'm not sure that's needed as a distinction. This quote from the 	r-help
mailing list [0]  addresses it:
I'm not sure you are correct on this. Other texts on multilevel models
(e.g., Raudenbush and Bryk, Kreft and Deeuw, and Singer & Willett) all
use FiML as a synonym for ML. In fact, Kreft and Deleeuw go as far to
even state they are the same thing (see page 131).

When you run a model in HLM selecting "Full Maximum Likelihood" and
method="ML" in lme, the results, including all fixed effects, variance
components, empirical bayes residuals, degrees of freedom are exactly
the same.

So, I think Doug [Bates] is correct in that ML == FiML. 

Harold
So maybe a semantics difference. However, with respect to the handling
of the integral: if it's problematic, that should result in a
non-convergence problem, or different results reported when he reruns
the model, in terms of diagnostics.

[0]https://stat.ethz.ch/pipermail/r-help/2004-August/056723.html

On 27 February 2015 at 16:20, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

I actually did try mice also (method "2l.norm"), but it seemed that Amelia
was preferable for imputation.  Mice seems to only be able to impute one
variable, whereas Amelia can impute as many variables as have missing data
producing 100% complete data sets as output.

However, most of the missing data in the data set I am working with is in
just one variable, so I could consider using mice, and just imputing the
variable that has the most missing data, while omitting observations that
have missing data in any of the other variables.  But the pooled results
from mice only seem to include the fixed effects of the model, so this
still leaves me wondering how to report the random effects, which are very
important to my research question.

When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
used to combine the results from each of the models.  But again, only the
fixed effects seem to be included in the output, so I am not sure how to
report on the random effects.

Bonnie

On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
wrote:

Mice might be the package you need

On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
wrote:

Dear list;

I am using nlme to create a repeated measures (i.e. 2 level) model.
There
is missing data in several of the predictor variables.  What is the best
way to handle this situation?  The variable with (by far) the most
missing
data is the best predictor in the model, so I would not want to remove
it.
I am also trying to avoid omitting the observations with missing data,
because that would require omitting almost 40% of the observations and
would result in a substantial loss of power.

A member of my dissertation committee who uses SAS, recommended that I
use
full information maximum likelihood estimation (FIML) (described here:

http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
),
which is the easiest way to handle missing data in SAS.  Is there an
equivalent procedure in R?

Alternatively, I have tried several approaches to multiple imputation.
For
example, I used the package, Amelia, which appears to handle the
clustered
structure of the data appropriately, to generate five imputed versions
of
the data set, and then used lapply to run my model on each.  But I am
not
sure how to combine the resulting five models into one final result.  I
will need a final result that enables me to report, not just the fixed
effects of the model, but also the random effects variance components
and,
ideally, the distributions across the population of the random intercept
and slopes, and correlations between them.

Many thanks for any suggestions on how to proceed.

Bonnie

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
____________________________
Ersatzistician and Chutzpahthologist

I can answer any question.  "I don't know" is an answer. "I don't know
yet" is a better answer.

"I can write better than anybody who can write faster, and I can write
faster than anybody who can write better" AJ Liebling

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

- -- 
Violence is the last refuge of the incompetent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBCgAGBQJU8A5AAAoJEDeph/0fVJWsbNUP/invP0QBC1qS0sWfKrnRVM09
kV1fv4Y8rVflFnS+znsbAPDJOK+5YnvITmfoVLMdwTAWaUEyugKZVGDydY+fTDfg
GxokxDpNAdGlfDBg+asw49VOFoTFtBKai0PWKyw4zHrAHYS9rzTqeO2CVq1Qlb8G
F7je9naYr+iwcEkIWQZ2JloBH8OPw80UueWqNjQ0totVRN8ehYgsu2+iyyudTQnH
Sl7LWkg6QnDYYVKrlV9ygd6z9yOymU9f5w52px1cUIY0mBoT12fYturEfyi/aIxF
+3nBjRCE14C2c9y6mW2Lab9AYpR8bbzsmTK6y7PXid6/VxcqkZlE6Qsj4bD4zvK3
lkIdFj8BR2LdzJNI1EdM8LREA82VPrkS5LFf/4ige0pSo6X3aVoInC2ohLKGSdr5
r66Nh3tLu1a6kPtPBNw7YAxzkzRd2CKy9OTvOpz5wRqlXNvzOoq2Is7Hpoeva0yB
3hvAAgmJUtq8ZbTEXLQiDl2w/qeO+8o5KRfm/2uutN8z29S768me/6bfnvLELw9w
y2R4vwOGdpp+3XBAfs8sF5bMGVvTEzZj/ILph5D7OFRJi/pfCbntnf2mAFrllvlt
KUh+Okd0bO5dC2gfLuu42J3jQnCTMez/ghrEVlXkRX9XMnMz3JB7r4pdgmUqXHYu
w9eXfCoXza9efwhgHF1q
=LMV6
-----END PGP SIGNATURE-----

Best way to handle missing data?

Thread (10 messages)