Independence of residuals in a computer-based experiment/simulation analysed using a LME?
Dear List I know cross posting to multiple fora is "bad" but I have had little response to a question I posted on the CrossValidated site yesterday and I wondered if I might exploit the expertise on this list to solicit an Answer to my query. CV Question is here (with nicer formatting): http://stats.stackexchange.com/q/40459/1390 The R-related and Mixed-Model-related bit is that I am doing the all the analysis in R and am using **lme4** and `lmer()`. I conducted a computer-based assessment of different methods of fitting a particular type of model used in the palaeo sciences. I had a large-ish training set and so I randomly (stratified random sampling) set aside a test set. I fitted m different methods to the training set samples and using the m resulting models I predicted the response for the test set samples and computed a RMSEP over the samples in the test set. This is a single run. I then repeated this process a large number of times, each time I chose a different training set by randomly sampling a new test set. Having done this I want to investigate if all methods have effectively the same error performance and whether any of the m methods has better or worse RMSEP performance via multiple pair-wise comparisons. My approach has been to fit a linear mixed effects (LME) model, with a single random effect for Run. I used lmer() from the lme4 package to fit my model and functions from the multcomp package to perform the multiple comparisons. My model was essentially lmer(RMSEP ~ method + (1 | Run), data = FOO) where method is a factor indicating which method was used to generate the model predictions for the test set and Run is an indicator for each particular Run of my "experiment". I used Tukey contrasts or coding from the multcomp package to do the multiple comparisons. My question is in regard to the residuals of the LME. Given the single random effect for Run the model assumes that the RMSEP (response) values (and hence residuals) for that run are correlated to some degree but are uncorrelated between runs, on the basis of the induced correlation that the random effect affords. Is this assumption of independence between runs valid? If not is there a way to account for this in the LME model or should I be looking to employ another type of statical analysis to answer my question? TIA Gavin
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%