Best way to handle missing data?

4 messages · Bonnie Dixon, Malcolm Fairbrother

Mon, Mar 2, 2015 11:37 AM #

Thanks for this suggestion, Malcolm.  Here is an example in which I use
Amelia/Zelig with the "africa" data set that is available in Amelia.
I extracted the average standard deviation of the random effects from the
result produced by Zelig.  (In this example, I am using the version of the
summary.MI function found here:
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
 Perhaps this approach will work for my purposes.

# Get packages
require(Amelia)
require(Zelig)
require(ZeligMultilevel)

# Look at the data
data(africa)
head(africa)
summary(africa)
help(africa)

# Impute the missing data
africa.am <-
  amelia(x = africa,
         m = 30,
         cs = "country",
         ts = "year",
         logs = "gdp_pc")
summary(africa.am)
plot(africa.am)
missmap(africa.am)
names(africa.am)

# Create a model:
africa.z <-
  zelig(formula = gdp_pc ~ infl + tag(infl | country),
        data = africa.am$imputations,
        model = "ls.mixed")

# The combined fixed effects:
summary(africa.z)

# The average standard deviation of the random intercepts and slopes:
ran.ints <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
mean(ran.ints)

ran.slopes <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["infl"])
mean(ran.slopes)



On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <

M.Fairbrother at bristol.ac.uk> wrote:

Hi Bonnie,

I have not seen a formal treatment of this issue, but from the Amelia
documentation, my understanding is that if you want an estimate of the
random effects variance, you can just take the average of the estimates
from the model fitted to each imputed dataset. This is true for any
parameter, from the sounds of what Honaker, King, and Blackwell have
written.

 "you can combine directly and use as the multiple imputation estimate of
this parameter, q ?, the average of them separate estimates"

Even if Zelig doesn't report the RE variance estimates automatically, they
must be "in there" somewhere... I'm sure you can extract them. Or maybe
skip Zelig, and just use Amelia, and extract the estimated RE variances
from each fitted model (presumably using lme4)?

Cheers,
Malcolm


Date: Thu, 26 Feb 2015 21:20:33 -0800

From: Bonnie Dixon <bmdixon at ucdavis.edu>
To: Mitchell Maltenfort <mmalten at gmail.com>
Cc: "r-sig-mixed-models at r-project.org"
        <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Best way to handle missing data?

I actually did try mice also (method "2l.norm"), but it seemed that Amelia
was preferable for imputation.  Mice seems to only be able to impute one
variable, whereas Amelia can impute as many variables as have missing data
producing 100% complete data sets as output.

However, most of the missing data in the data set I am working with is in
just one variable, so I could consider using mice, and just imputing the
variable that has the most missing data, while omitting observations that
have missing data in any of the other variables.  But the pooled results
from mice only seem to include the fixed effects of the model, so this
still leaves me wondering how to report the random effects, which are very
important to my research question.

When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
used to combine the results from each of the models.  But again, only the
fixed effects seem to be included in the output, so I am not sure how to
report on the random effects.

Bonnie

On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
wrote:

Mice might be the package you need


On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>

wrote:

Dear list;

I am using nlme to create a repeated measures (i.e. 2 level) model.

There

is missing data in several of the predictor variables.  What is the

best

way to handle this situation?  The variable with (by far) the most

missing

data is the best predictor in the model, so I would not want to remove

it.

I am also trying to avoid omitting the observations with missing data,
because that would require omitting almost 40% of the observations and
would result in a substantial loss of power.

A member of my dissertation committee who uses SAS, recommended that I

use

full information maximum likelihood estimation (FIML) (described here:

http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf

),
which is the easiest way to handle missing data in SAS.  Is there an
equivalent procedure in R?

Alternatively, I have tried several approaches to multiple imputation.
For
example, I used the package, Amelia, which appears to handle the

clustered

structure of the data appropriately, to generate five imputed versions

of

the data set, and then used lapply to run my model on each.  But I am

not

sure how to combine the resulting five models into one final result.  I
will need a final result that enables me to report, not just the fixed
effects of the model, but also the random effects variance components

and,

ideally, the distributions across the population of the random

intercept

and slopes, and correlations between them.

Many thanks for any suggestions on how to proceed.

Bonnie

Malcolm Fairbrother

Tue, Mar 3, 2015 1:53 AM #

Hi Bonnie,

I was getting an error with that code, and finding Zelig cumbersome. So how
about just:

mods <- lapply(africa.am[[1]], function(x) lmer(gdp_pc ~ infl + (infl |
country), data=x))
rowMeans(sapply(mods, fixef)) # fixed effects
rowMeans(sapply(mods, function(x) as.data.frame(VarCorr(x))$vcov)) # random
effects variances
Reduce("+", sapply(mods, ranef))/length(mods) # random effects (intercepts
and slopes)
sqrt(rowMeans(sapply(mods, function(x) diag(vcov(x)))) +
diag(var(t(sapply(mods, fixef))))*(1+1/length(mods))) # SEs

I think that gets you everything you want?

The last row of code is my interpretation of: "The variance of the point
estimate is the average of the estimated variances from within each
completed data set, plus the sample variance in the point estimates across
the data sets (multiplied by a factor that corrects for the bias because m
< ?)." (from http://r.iq.harvard.edu/docs/amelia/amelia.pdf)

Does that correspond to what you were getting via Zelig? I'd be interested
to know that this worked, actually.

Cheers,
Malcolm

On 2 March 2015 at 20:37, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

Thanks for this suggestion, Malcolm.  Here is an example in which I use
Amelia/Zelig with the "africa" data set that is available in Amelia.
I extracted the average standard deviation of the random effects from the
result produced by Zelig.  (In this example, I am using the version of the
summary.MI function found here:
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
 Perhaps this approach will work for my purposes.

# Get packages
require(Amelia)
require(Zelig)
require(ZeligMultilevel)

# Look at the data
data(africa)
head(africa)
summary(africa)
help(africa)

# Impute the missing data
africa.am <-
  amelia(x = africa,
         m = 30,
         cs = "country",
         ts = "year",
         logs = "gdp_pc")
summary(africa.am)
plot(africa.am)
missmap(africa.am)
names(africa.am)

# Create a model:
africa.z <-
  zelig(formula = gdp_pc ~ infl + tag(infl | country),
        data = africa.am$imputations,
        model = "ls.mixed")

# The combined fixed effects:
summary(africa.z)

# The average standard deviation of the random intercepts and slopes:
ran.ints <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
mean(ran.ints)

ran.slopes <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["infl"])
mean(ran.slopes)



On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:

Hi Bonnie,

I have not seen a formal treatment of this issue, but from the Amelia
documentation, my understanding is that if you want an estimate of the
random effects variance, you can just take the average of the estimates
from the model fitted to each imputed dataset. This is true for any
parameter, from the sounds of what Honaker, King, and Blackwell have
written.

 "you can combine directly and use as the multiple imputation estimate of
this parameter, q ?, the average of them separate estimates"

Even if Zelig doesn't report the RE variance estimates automatically,
they must be "in there" somewhere... I'm sure you can extract them. Or
maybe skip Zelig, and just use Amelia, and extract the estimated RE
variances from each fitted model (presumably using lme4)?

Cheers,
Malcolm


Date: Thu, 26 Feb 2015 21:20:33 -0800

From: Bonnie Dixon <bmdixon at ucdavis.edu>
To: Mitchell Maltenfort <mmalten at gmail.com>
Cc: "r-sig-mixed-models at r-project.org"
        <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Best way to handle missing data?

I actually did try mice also (method "2l.norm"), but it seemed that
Amelia
was preferable for imputation.  Mice seems to only be able to impute one
variable, whereas Amelia can impute as many variables as have missing
data
producing 100% complete data sets as output.

However, most of the missing data in the data set I am working with is in
just one variable, so I could consider using mice, and just imputing the
variable that has the most missing data, while omitting observations that
have missing data in any of the other variables.  But the pooled results
from mice only seem to include the fixed effects of the model, so this
still leaves me wondering how to report the random effects, which are
very
important to my research question.

When using Amelia to impute, the packages Zelig and ZeligMultilevel can
be
used to combine the results from each of the models.  But again, only the
fixed effects seem to be included in the output, so I am not sure how to
report on the random effects.

Bonnie

On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
wrote:

Mice might be the package you need


On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>

wrote:

Dear list;

I am using nlme to create a repeated measures (i.e. 2 level) model.

There

is missing data in several of the predictor variables.  What is the

best

way to handle this situation?  The variable with (by far) the most

missing

data is the best predictor in the model, so I would not want to

remove it.

I am also trying to avoid omitting the observations with missing data,
because that would require omitting almost 40% of the observations and
would result in a substantial loss of power.

A member of my dissertation committee who uses SAS, recommended that

I use

full information maximum likelihood estimation (FIML) (described here:

http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf

),
which is the easiest way to handle missing data in SAS.  Is there an
equivalent procedure in R?

Alternatively, I have tried several approaches to multiple imputation.
For
example, I used the package, Amelia, which appears to handle the

clustered

structure of the data appropriately, to generate five imputed

versions of

the data set, and then used lapply to run my model on each.  But I am

not

sure how to combine the resulting five models into one final result.

will need a final result that enables me to report, not just the fixed
effects of the model, but also the random effects variance components

and,

ideally, the distributions across the population of the random

intercept

and slopes, and correlations between them.

Many thanks for any suggestions on how to proceed.

Bonnie

Bonnie Dixon

Tue, Mar 3, 2015 9:42 PM #

Thanks, Malcolm!  Yes, that extracts all the results I need from out of the
multiple imputation models.  This method also seems preferable over using
Zelig because it provides the flexibility to create the models using lme4,
or using a different package, such as nlme or MCMCglmm.  And, yes, the
results using this method are identical to the results from Zelig.

Bonnie

P.S.  By the way, that error that occurs when using summary() on the object
created by Zelig can be avoided by using the modified version of the
summary.MI function below (from
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l
).

summary.MI <- function (object, subset = NULL, ...) {
  if (length(object) == 0) {
    stop('Invalid input for "subset"')
  } else {
    if (length(object) == 1) {
      return(summary(object[[1]]))
    }
  }

    getcoef <- function(obj) {
    # S4
    if (!isS4(obj)) {
      coef(obj)
    } else {
      if ("coef3" %in% slotNames(obj)) {
        obj at coef3
      } else {
        obj at coef
      }
    }
  }

  res <- list()

  # Get indices
  subset <- if (is.null(subset)) {
    1:length(object)
  } else {
    c(subset)
  }

  # Compute the summary of all objects
  for (k in subset) {
    res[[k]] <- summary(object[[k]])
  }


  # Answer
  ans <- list(
    zelig = object[[1]]$name,
    call = object[[1]]$result at call,
    all = res
  )

  coef1 <- se1 <- NULL

  for (k in subset) {
    tmp <- coef(res[[k]])
    coef1 <- cbind(coef1, tmp[, 1])
    se1 <- cbind(se1, tmp[, 2])
  }

  rows <- nrow(coef1)
  Q <- apply(coef1, 1, mean)
  U <- apply(se1^2, 1, mean)
  B <- apply((coef1-Q)^2, 1, sum)/(length(subset)-1)
  var <- U+(1+1/length(subset))*B
  nu <- (length(subset)-1)*(1+U/((1+1/length(subset))*B))^2

  coef.table <- matrix(NA, nrow = rows, ncol = 4)
  dimnames(coef.table) <- list(rownames(coef1),
                               c("Value", "Std. Error", "t-stat",
"p-value"))
  coef.table[,1] <- Q
  coef.table[,2] <- sqrt(var)
  coef.table[,3] <- Q/sqrt(var)
  coef.table[,4] <- pt(abs(Q/sqrt(var)), df=nu, lower.tail=F)*2
  ans$coefficients <- coef.table
  ans$cov.scaled <- ans$cov.unscaled <- NULL

  for (i in 1:length(ans)) {
    if (is.numeric(ans[[i]]) && !names(ans)[i] %in% c("coefficients")) {
      tmp <- NULL
      for (j in subset) {
        r <- res[[j]]
        tmp <- cbind(tmp, r[[pmatch(names(ans)[i], names(res[[j]]))]])
      }
      ans[[i]] <- apply(tmp, 1, mean)
    }
  }

  class(ans) <- "summaryMI"
  ans
}

On Tue, Mar 3, 2015 at 1:53 AM, Malcolm Fairbrother <

M.Fairbrother at bristol.ac.uk> wrote:

Hi Bonnie,

I was getting an error with that code, and finding Zelig cumbersome. So
how about just:

mods <- lapply(africa.am[[1]], function(x) lmer(gdp_pc ~ infl + (infl |
country), data=x))
rowMeans(sapply(mods, fixef)) # fixed effects
rowMeans(sapply(mods, function(x) as.data.frame(VarCorr(x))$vcov)) #
random effects variances
Reduce("+", sapply(mods, ranef))/length(mods) # random effects (intercepts
and slopes)
sqrt(rowMeans(sapply(mods, function(x) diag(vcov(x)))) +
diag(var(t(sapply(mods, fixef))))*(1+1/length(mods))) # SEs

I think that gets you everything you want?

The last row of code is my interpretation of: "The variance of the point
estimate is the average of the estimated variances from within each
completed data set, plus the sample variance in the point estimates across
the data sets (multiplied by a factor that corrects for the bias because m
< ?)." (from http://r.iq.harvard.edu/docs/amelia/amelia.pdf)

Does that correspond to what you were getting via Zelig? I'd be interested
to know that this worked, actually.

Cheers,
Malcolm




On 2 March 2015 at 20:37, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

Thanks for this suggestion, Malcolm.  Here is an example in which I use
Amelia/Zelig with the "africa" data set that is available in Amelia.
I extracted the average standard deviation of the random effects from the
result produced by Zelig.  (In this example, I am using the version of the
summary.MI function found here:
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
 Perhaps this approach will work for my purposes.

# Get packages
require(Amelia)
require(Zelig)
require(ZeligMultilevel)

# Look at the data
data(africa)
head(africa)
summary(africa)
help(africa)

# Impute the missing data
africa.am <-
  amelia(x = africa,
         m = 30,
         cs = "country",
         ts = "year",
         logs = "gdp_pc")
summary(africa.am)
plot(africa.am)
missmap(africa.am)
names(africa.am)

# Create a model:
africa.z <-
  zelig(formula = gdp_pc ~ infl + tag(infl | country),
        data = africa.am$imputations,
        model = "ls.mixed")

# The combined fixed effects:
summary(africa.z)

# The average standard deviation of the random intercepts and slopes:
ran.ints <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
mean(ran.ints)

ran.slopes <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["infl"])
mean(ran.slopes)



On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:

Hi Bonnie,

I have not seen a formal treatment of this issue, but from the Amelia
documentation, my understanding is that if you want an estimate of the
random effects variance, you can just take the average of the estimates
from the model fitted to each imputed dataset. This is true for any
parameter, from the sounds of what Honaker, King, and Blackwell have
written.

 "you can combine directly and use as the multiple imputation estimate
of this parameter, q ?, the average of them separate estimates"

Even if Zelig doesn't report the RE variance estimates automatically,
they must be "in there" somewhere... I'm sure you can extract them. Or
maybe skip Zelig, and just use Amelia, and extract the estimated RE
variances from each fitted model (presumably using lme4)?

Cheers,
Malcolm


Date: Thu, 26 Feb 2015 21:20:33 -0800

From: Bonnie Dixon <bmdixon at ucdavis.edu>
To: Mitchell Maltenfort <mmalten at gmail.com>
Cc: "r-sig-mixed-models at r-project.org"
        <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Best way to handle missing data?

I actually did try mice also (method "2l.norm"), but it seemed that
Amelia
was preferable for imputation.  Mice seems to only be able to impute one
variable, whereas Amelia can impute as many variables as have missing
data
producing 100% complete data sets as output.

However, most of the missing data in the data set I am working with is
in
just one variable, so I could consider using mice, and just imputing the
variable that has the most missing data, while omitting observations
that
have missing data in any of the other variables.  But the pooled results
from mice only seem to include the fixed effects of the model, so this
still leaves me wondering how to report the random effects, which are
very
important to my research question.

When using Amelia to impute, the packages Zelig and ZeligMultilevel can
be
used to combine the results from each of the models.  But again, only
the
fixed effects seem to be included in the output, so I am not sure how to
report on the random effects.

Bonnie

On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com

wrote:

Mice might be the package you need


On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>

wrote:

Dear list;

I am using nlme to create a repeated measures (i.e. 2 level) model.

There

is missing data in several of the predictor variables.  What is the

best

way to handle this situation?  The variable with (by far) the most

missing

data is the best predictor in the model, so I would not want to

remove it.

I am also trying to avoid omitting the observations with missing

data,

because that would require omitting almost 40% of the observations

and

would result in a substantial loss of power.

A member of my dissertation committee who uses SAS, recommended that

I use

full information maximum likelihood estimation (FIML) (described

here:

http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf

),
which is the easiest way to handle missing data in SAS.  Is there an
equivalent procedure in R?

Alternatively, I have tried several approaches to multiple

imputation.

For
example, I used the package, Amelia, which appears to handle the

clustered

structure of the data appropriately, to generate five imputed

versions of

the data set, and then used lapply to run my model on each.  But I

am not

sure how to combine the resulting five models into one final

result.  I

will need a final result that enables me to report, not just the

fixed

effects of the model, but also the random effects variance

components and,

ideally, the distributions across the population of the random

intercept

and slopes, and correlations between them.

Many thanks for any suggestions on how to proceed.

Bonnie

Malcolm Fairbrother

Tue, Mar 3, 2015 11:42 PM #

Great, glad that worked!

On the topic of MCMC estimation with multiply imputed data, my
understanding is that you can simply merge each (m=5, 30, or whatever) of
the separate chains, and then analyse those using whatever techniques you
would have applied to a single chain. I guess this could also be useful for
assessing convergence. Others are welcome to correct me on this, however.

- Malcolm

On 4 March 2015 at 06:42, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

Thanks, Malcolm!  Yes, that extracts all the results I need from out of
the multiple imputation models.  This method also seems preferable over
using Zelig because it provides the flexibility to create the models using
lme4, or using a different package, such as nlme or MCMCglmm.  And, yes,
the results using this method are identical to the results from Zelig.

Bonnie

P.S.  By the way, that error that occurs when using summary() on the
object created by Zelig can be avoided by using the modified version of the
summary.MI function below (from
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l
).

summary.MI <- function (object, subset = NULL, ...) {
  if (length(object) == 0) {
    stop('Invalid input for "subset"')
  } else {
    if (length(object) == 1) {
      return(summary(object[[1]]))
    }
  }

    getcoef <- function(obj) {
    # S4
    if (!isS4(obj)) {
      coef(obj)
    } else {
      if ("coef3" %in% slotNames(obj)) {
        obj at coef3
      } else {
        obj at coef
      }
    }
  }

  res <- list()

  # Get indices
  subset <- if (is.null(subset)) {
    1:length(object)
  } else {
    c(subset)
  }

  # Compute the summary of all objects
  for (k in subset) {
    res[[k]] <- summary(object[[k]])
  }


  # Answer
  ans <- list(
    zelig = object[[1]]$name,
    call = object[[1]]$result at call,
    all = res
  )

  coef1 <- se1 <- NULL

  for (k in subset) {
    tmp <- coef(res[[k]])
    coef1 <- cbind(coef1, tmp[, 1])
    se1 <- cbind(se1, tmp[, 2])
  }

  rows <- nrow(coef1)
  Q <- apply(coef1, 1, mean)
  U <- apply(se1^2, 1, mean)
  B <- apply((coef1-Q)^2, 1, sum)/(length(subset)-1)
  var <- U+(1+1/length(subset))*B
  nu <- (length(subset)-1)*(1+U/((1+1/length(subset))*B))^2

  coef.table <- matrix(NA, nrow = rows, ncol = 4)
  dimnames(coef.table) <- list(rownames(coef1),
                               c("Value", "Std. Error", "t-stat",
"p-value"))
  coef.table[,1] <- Q
  coef.table[,2] <- sqrt(var)
  coef.table[,3] <- Q/sqrt(var)
  coef.table[,4] <- pt(abs(Q/sqrt(var)), df=nu, lower.tail=F)*2
  ans$coefficients <- coef.table
  ans$cov.scaled <- ans$cov.unscaled <- NULL

  for (i in 1:length(ans)) {
    if (is.numeric(ans[[i]]) && !names(ans)[i] %in% c("coefficients")) {
      tmp <- NULL
      for (j in subset) {
        r <- res[[j]]
        tmp <- cbind(tmp, r[[pmatch(names(ans)[i], names(res[[j]]))]])
      }
      ans[[i]] <- apply(tmp, 1, mean)
    }
  }

  class(ans) <- "summaryMI"
  ans
}

On Tue, Mar 3, 2015 at 1:53 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:

Hi Bonnie,

I was getting an error with that code, and finding Zelig cumbersome. So
how about just:

mods <- lapply(africa.am[[1]], function(x) lmer(gdp_pc ~ infl + (infl |
country), data=x))
rowMeans(sapply(mods, fixef)) # fixed effects
rowMeans(sapply(mods, function(x) as.data.frame(VarCorr(x))$vcov)) #
random effects variances
Reduce("+", sapply(mods, ranef))/length(mods) # random effects
(intercepts and slopes)
sqrt(rowMeans(sapply(mods, function(x) diag(vcov(x)))) +
diag(var(t(sapply(mods, fixef))))*(1+1/length(mods))) # SEs

I think that gets you everything you want?

The last row of code is my interpretation of: "The variance of the point
estimate is the average of the estimated variances from within each
completed data set, plus the sample variance in the point estimates across
the data sets (multiplied by a factor that corrects for the bias because m
< ?)." (from http://r.iq.harvard.edu/docs/amelia/amelia.pdf)

Does that correspond to what you were getting via Zelig? I'd be
interested to know that this worked, actually.

Cheers,
Malcolm




On 2 March 2015 at 20:37, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

Thanks for this suggestion, Malcolm.  Here is an example in which I use
Amelia/Zelig with the "africa" data set that is available in Amelia.
I extracted the average standard deviation of the random effects from the
result produced by Zelig.  (In this example, I am using the version of the
summary.MI function found here:
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
 Perhaps this approach will work for my purposes.

# Get packages
require(Amelia)
require(Zelig)
require(ZeligMultilevel)

# Look at the data
data(africa)
head(africa)
summary(africa)
help(africa)

# Impute the missing data
africa.am <-
  amelia(x = africa,
         m = 30,
         cs = "country",
         ts = "year",
         logs = "gdp_pc")
summary(africa.am)
plot(africa.am)
missmap(africa.am)
names(africa.am)

# Create a model:
africa.z <-
  zelig(formula = gdp_pc ~ infl + tag(infl | country),
        data = africa.am$imputations,
        model = "ls.mixed")

# The combined fixed effects:
summary(africa.z)

# The average standard deviation of the random intercepts and slopes:
ran.ints <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
mean(ran.ints)

ran.slopes <-
  sapply(africa.z,
         function(x)
           attributes(VarCorr(x$result)$country)$stddev["infl"])
mean(ran.slopes)



On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:

Hi Bonnie,

I have not seen a formal treatment of this issue, but from the Amelia
documentation, my understanding is that if you want an estimate of the
random effects variance, you can just take the average of the estimates
from the model fitted to each imputed dataset. This is true for any
parameter, from the sounds of what Honaker, King, and Blackwell have
written.

 "you can combine directly and use as the multiple imputation estimate
of this parameter, q ?, the average of them separate estimates"

Even if Zelig doesn't report the RE variance estimates automatically,
they must be "in there" somewhere... I'm sure you can extract them. Or
maybe skip Zelig, and just use Amelia, and extract the estimated RE
variances from each fitted model (presumably using lme4)?

Cheers,
Malcolm


Date: Thu, 26 Feb 2015 21:20:33 -0800

From: Bonnie Dixon <bmdixon at ucdavis.edu>
To: Mitchell Maltenfort <mmalten at gmail.com>
Cc: "r-sig-mixed-models at r-project.org"
        <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Best way to handle missing data?

I actually did try mice also (method "2l.norm"), but it seemed that
Amelia
was preferable for imputation.  Mice seems to only be able to impute
one
variable, whereas Amelia can impute as many variables as have missing
data
producing 100% complete data sets as output.

However, most of the missing data in the data set I am working with is
in
just one variable, so I could consider using mice, and just imputing
the
variable that has the most missing data, while omitting observations
that
have missing data in any of the other variables.  But the pooled
results
from mice only seem to include the fixed effects of the model, so this
still leaves me wondering how to report the random effects, which are
very
important to my research question.

When using Amelia to impute, the packages Zelig and ZeligMultilevel
can be
used to combine the results from each of the models.  But again, only
the
fixed effects seem to be included in the output, so I am not sure how
to
report on the random effects.

Bonnie

On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <
mmalten at gmail.com>
wrote:

Mice might be the package you need


On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>

wrote:

Dear list;

I am using nlme to create a repeated measures (i.e. 2 level)

model.  There

is missing data in several of the predictor variables.  What is the

best

way to handle this situation?  The variable with (by far) the most

missing

data is the best predictor in the model, so I would not want to

remove it.

I am also trying to avoid omitting the observations with missing

data,

because that would require omitting almost 40% of the observations

and

would result in a substantial loss of power.

A member of my dissertation committee who uses SAS, recommended

that I use

full information maximum likelihood estimation (FIML) (described

here:

http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf

),
which is the easiest way to handle missing data in SAS.  Is there an
equivalent procedure in R?

Alternatively, I have tried several approaches to multiple

imputation.

For
example, I used the package, Amelia, which appears to handle the

clustered

structure of the data appropriately, to generate five imputed

versions of

the data set, and then used lapply to run my model on each.  But I

am not

sure how to combine the resulting five models into one final

result.  I

will need a final result that enables me to report, not just the

fixed

effects of the model, but also the random effects variance

components and,

ideally, the distributions across the population of the random

intercept

and slopes, and correlations between them.

Many thanks for any suggestions on how to proceed.

Bonnie