lme capable of running with missing data?
On Fri, Feb 3, 2012 at 8:20 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
On 04/02/12 14:45, Kenneth Frost wrote:
On 02/03/12, Charles Determan Jr ? wrote:
Kevin, I understand that but then how is SAS accomplishing the interactions?
I have been following this conversation a little bit and this seems to be the right question to ask. I would also like to know the answer. However, this could be the wrong venue to get an answer to this question.
<SNIP> It may be the case that fortune(203) is relevant here! :-)
Mathematical impossibilty, no (fortune(203) refers to obtaining negative estimates of variance components, IIRC). The problem here is determining a full-rank model matrix for a model with interactions and missing cells. Because SAS uses the sweep operator in solving least squares problems it does not encounter problems with rank deficiency. (I am sorely tempted to make remarks about "sweeping them under the carpet".) In fact, SAS expects to handle rank deficiencies because it generates a redundant set of indicators for each factor variable then prunes them on the fly. The approach in R is to generate a model matrix that should be of full-rank except in circumstances like this and to check for rank deficiency. There is special code in the version of the QR decomposition used with R to detect rank deficiency and pivot the offending columns out but keep the others in their original order. Dirk Eddelbuettel and I explored several approaches to handling such rank deficiency in the vignette accompanying the RcppEigen package (http://cran.us.r-project.org/web/packages/RcppEigen/vignettes/RcppEigen-intro-nojss.pdf). The development version of lme4 (called lme4Eigen on the R-forge project site) detects rank deficiency earlier in the calculation but does not yet repair the rank deficiency. Using the column-pivoted QR decomposition is probably the best approach but even then it would be necessary to find the columns that are linear dependent on columns to their left then drop only those columns. It is not impossible by any means, it just requires some work and is not high on the priority list right now. Regarding type III tests, I have forgotten which ones they are. Are they the sequential sums of squares or the ones where you drop the main effect but keep the interactions thereby rendering your null model nonsensical is most cases? All the silliness about Types I, II, III and IV sums of squares and tests was formulated when fitting any model was difficult (see fortune("JCL")). So doing a hypothesis test by fitting the null model and fitting the alternative model and comparing the results would take much much longer than doing a lot of linear algebra gymnastics on the fit of the full or alternative model. That is no longer the case. If you really want to perform a hypothesis test then formulate it in terms of models, fit them and compare them. It's not difficult and has the undeniable advantage of forcing you to think about the model and whether it makes sense. Read Bill Venables' famous unpublished paper "Exegeses on Linear Models" (just put the name in a search engine). (By the way, Bill is going to be at the useR conference in Nashville in July so maybe if a bunch of us ganged up on him he could be convinced to submit a version of that paper for publication.)