Skip to content
Prev 299056 / 398506 Next

EM algorithm to find MLE of coeff in mixed effects model

On Tue, Jul 3, 2012 at 12:41 PM, jimmycloud <jimmycloud at gmail.com> wrote:
I have 2 ideas for you.

1. Fit with lme4 package, using the lmer function. That's what it is for.

2. If you really want to write your own EM algorithm, I don't feel
sure that very many R and EM experts are going to want to read through
the code you have because you don't follow some of the minimal
readability guidelines.  I accept the fact that there is no officially
mandated R style guide, except for "indent with 4 spaces, not tabs."
But there are some almost universally accepted basics like

1. Use <-, not =, for assignment
2. put a space before and  after symbols like <-, = , + , * , and so forth.
3. I'd suggest you get used to putting squiggly braces in the K&R style.

I have found the formatR package's tool tidy.source can do this
nicely. From tidy.source, here's what I get with your code

http://pj.freefaculty.org/R/em2.R

Much more readable!
(I inserted library(MASS) for you at the top, otherwise this doesn't
even survive the syntax check.)

When I copy from email to Emacs, some line-breaking monkey-business
occurs, but I expect you get the idea.

Now, looking at your cleaned up code, I can see some things to tidy up
to improve the chances that some expert will look.

First, R functions don't require "return" at the end, many experts
consider it distracting. (Run "lm" or such and review how they write.
No return at the end of functions).

Second, about that big for loop at the top, the one that goes from m 1:300

I don't know what that loop is doing, but there's some code in it that
I'm suspicious of. For example, this part:

 W.hat <- psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*%  Zi %*% psi.old

    Sigb <- psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*%  Zi %*% psi.old


First, you've caused the possibly slow calculation of solve
 (Sig.hat) to run two times.  If you really need it, run it once and
save the result.


Second, a for loop is not necessarily slow, but it may be easier to
read if you re-write this:

 for (j in 1:n) {
tempaa[j]=Eh4new(datai=data.list[[j]],weights.m=weights.mat,beta=beta.old)
        tempbb[j] <- Eh4newv2(datai = data.list[[j]], weights.m =
weights.mat, beta = beta.old)
 }

like this:

tempaa <- lapply(data.list, Eh4new, weights.m, beta.old)
tempbb <- lapply(data.list, Eh4newv2, weights.m, beta.old)

Third, here is a no-no

tempbb <- c()
for (j in 1:n) {
        tempbb <- cbind(tempbb, Eh2new(data.list[[j]], weights.m = weights.mat))
    }

That will call cbind over and over, causing a relatively slow memory
re-allocation.  See
(http://pj.freefaculty.org/R/WorkingExamples/stackListItems.R)

Instead, do this to apply Eh2new to each item in data.list

tempbbtemp <- lapply(data.list, Eh2new, weights.mat)

and then smash the results together in one go

tempbb <- do.call("cbind", tempbbtemp)


Fourth, I'm not sure on the matrix algebra. Are you sure you need
solve to get the full inverse of Sig.hat?  Once you start checking
into how estimates are calculated in R, you find that the
paper-and-pencil matrix algebra style of formula is generally frowned
upon. OLS (in lm.fit) doesn't do solve(X'X), it uses a QR
decomposition of matrices.  OR look in MASS package ridge regression
code, where the SVD is used to get the inverse.

I wish I knew enough about the EM algorithm to gaze at your code and
say "aha, error in line 332", but I'm not.  But if you clean up the
presentation and tighten up the most obvious things, you improve your
chances that somebody who is an expert will exert him/herself to do
it.

pj