Dear all, I'm currently trying to write the missing data imputation code through Gaussian mixture model. My current reference as provided in the attachment or you can reach at this url: http://www.sciencedirect.com/science/article/pii/S0167947306003707 This is the code for e-step and m-step through the Gaussian mixture model: data(faithful); library(mvtnorm); # using multivariate normal package (need to install this but easy .... ) N = dim(faithful)[1]; # number of sample points X = faithful[,1:2]; # the data matrix alpha1 = alpha2 = .5; # these are our initial class probability m1 = c(2,90); # initial means, chosen to be bad m2 = c(4,50); Sigma1 = Sigma2 = diag(c(var(X[,1]),var(X[,2]))); # initial covariance matrices computed from entire data e_step<-function(x,m1,m2,Sigma1,Sigma2,alpha){ x_missing<-x x_missing$eruptions[1:10] <- NA x_missing$waiting[10:20] <- NA x_missing$eruptions[is.na(x_missing$eruptions)] = mean(x_missing$eruptions, na.rm=TRUE) x_missing$waiting[is.na(x_missing$waiting)] = mean(x_missing$waiting, na.rm=TRUE) x<-x_missing comp1.prod <- dmvnorm(x, m1, Sigma1) * alpha[1] comp2.prod <- dmvnorm(x, m2, Sigma2) * alpha[2] sum.of.comps <- comp1.prod + comp2.prod comp1.post <- comp1.prod / sum.of.comps comp2.post <- comp2.prod / sum.of.comps ll = sum(log(sum.of.comps)) list("ll" = ll, "posterior.df" = cbind(comp1.post, comp2.post)) } m_step<-function(x,posterior.df){ comp1.n <- sum(posterior.df[, 1]) comp2.n <- sum(posterior.df[, 2]) comp1.alpha <- comp1.n / length(x) comp2.alpha <- comp2.n / length(x) comp1.mu <- colSums(posterior.df[, 1]*X)/comp1.n comp2.mu <- colSums(posterior.df[, 2]*X)/comp2.n resp1=posterior.df[, 1] resp2=posterior.df[, 2] acc1 = acc2 = matrix(0,nrow=2,ncol=2); Y = as.matrix(x); for (n in 1:N) { acc1 = acc1 + resp1[n] * ((Y[n,] - m1) %*% t(Y[n,]-m1)); acc2 = acc2 + resp2[n] * ((Y[n,] - m2) %*% t(Y[n,]-m2)); } Sigma1 = acc1/sum(resp1); Sigma2 = acc2/sum(resp1); list("m1" = comp1.mu, "m2" = comp2.mu, "Sigma1" = Sigma1, "Sigma2" = Sigma2, "alpha" = c(comp1.alpha, comp2.alpha)) } I have noticed that mean imputation part in the e-step is not in right way. Therefore i want to apply the random draw imputation as mentioned in the paper at page number 5308. Can anybody here guide me on how to write the random draw imputation in R? Thank you in advance -Jas
Handling missing data through Gaussian mixture model
1 message · jas ni