Impact of multiple imputation on correlations

Hi Tina,

That is quite a bit of missingness, especially considering the sample
size is not large to begin with.  This would make me treat *any*
result cautiously.  That said, if you have a reasonable idea what the
mechanism causing the missingness is or if from additional variables
in your study, you can model the missing data mechanism sufficiently
that you are confident (for some definition of confident) that the
missingness is random after accounting for your model (conditional
independence, I forget if Rubin calls it MCAR or MAR), you are in a
reasonable place to use MI and draw inferences from the results.

Even if you are uncertain about this, it is *not* any better to just
say, "well there was too much missing data for me to feel safe using
MI so here is the correlation based just on the observed data".  That
_will be biased_ unless the missing data mechanism is completely
random (even unconditioned on anything else in your study; for example
if participants flipped coins to decide which questions to respond
to).

When averaging correlations, it is conventional to average the inverse
hyperbolic function of the correlations and then use the hyperbolic
function to transform the averaged value back to the original units
(also known as Fisher's Z transformation).  The mice package may do
this automatically if there is a functiong to compute pooled
correlations.

How results between simply deleted cases with any value unobserved and
using MI varies.  There may be no difference, are larger difference,
or a smaller difference.

Looking at the scatter plot matrix from the different imputations, I
do not know that I would actually classify that as varying quite a
bit.  I realize the sign of the slope changes some, but that is not
too surprising because all of them are somewhat close to flat.  You
can compare the between imputation variance to the within imputation
variance (I think mice gives you this information).

I partly addressed your last question at the beginning---I would
certainly not trust the correlation obtained simply by deleting
missingness, but I also would not trust the result obtained using MI
unless it was well setup.  Although you have shown us some of the
data, you have not mentioned how you modelled the missingness.  This
can have a substantial impact on your results (and also their
trustworthyness).  mice provides a number of different models and you
have a choice in what variables you use if you collect a lot in your
study.

Given all of this, I would suggest finding a local statistician or
consultant to talk with about this.  Your question(s) are more
statistical than they are R related.  Also, in addition to learning
more about MI (there are several good books and articles on it that
you can look up or email me offlist and I can provide references if
you want), someone who is there can be more helpful because they will
have access to your whole dataset and can work with you to find the
best variables/model to model the missing data mechanism.

I hope this helps and good luck,

Josh

Impact of multiple imputation on correlations

Thread (2 messages)