Dear Ben,
Many thanks for your message - after some experimentation I did indeed confirm that the correct specification in lmer was the one you mentioned, i.e. the correct code to get almost identical output as in SAS/glimmix was (as I thought it might be useful to share this for others I attach the datafile, and also the PDF with the original SAS analysis)
microarray=read.csv("microarray_dataset1612_R.csv",colClasses=c(rep("factor",8),"numeric"))
library(lme4)
library(lmerTest)
#options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly")) # not sure if this would be necessary
fit=lmer(log_intensity~(1|array/gene)+(1|array:pin)+(1|array:dip)+dye+trt+gene+dye:gene+trt:gene+pin, data=microarray)
summary(fit)
anova(fit,ddf="Kenward-Roger")
(the degrees of freedom are slightly different though as in the SAS output, not sure why that is, but the p vals are practically identical)
The only remaining question I now have is how I could obtain the least square means, equivalent to the SAS code
lsmeans trt*gene / slice=gene
slicediff=gene
slicedifftype=control('0' '2');
(cf PDF in attachment)
Would anyone happen to know perhaps how I could obtain these using package lsmeans? (I am new to that package, so apologies if this would happen to be trivial)
Cheers,
Tom
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: 29 June 2013 17:58
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question on mixed model design to analyze microarray data
Tom Wenseleers <Tom.Wenseleers at ...> writes:
Dear all,
I was just trying to analyse some microarray gene expression data
using mixed models in R.
Based on the SAS code given here
http://support.sas.com/documentation/cdl/en/
statug/63033/HTML/default/viewer.htm#statug_hpmixed_sect035.htm
(URL broken, sorry)
proc hpmixed data=microarray;
class marray dye trt gene pin dip;
model log2i = dye trt gene dye*gene trt*gene pin;
random marray marray*gene dip(marray) pin*marray;
test trt;
run;
I wanted to try to run the equivalent model in R using lmer and then
test significance using lmerTest and/or lsmeans. Am I correct that the
correct syntax would be
lmer(log2i~dye+trt+gene+dye:gene+trt:gene+pin+(1|marray)+
(gene|marray)+(1|marray/dip)+(pin|marray), data=microarray) (gene has
about 10 000 levels and marray 10, would this run in lmer?)
I think you probably want a random effect of
(1|marray) + (1|marray:gene) + (1|marray:pin)+(1|marray:dip)
or equivalently
(1|marray/gene) + (1|marray:pin) + (1|marray:dip)
you almost certainly _don't_ want (gene|marray), which will try to fit correlations among gene effects by microarray (i.e a 10K by 10K correlation matrix), which will explode your computer. At a quick read I'm not quite sure what distinction the SAS authors are making between "X by Y" and "X within Y": I would code "X by Y" as
(1|X) + (1|Y) + (1|X:Y) [ (1|X*Y) might work, but I haven't tried it ] and "X within Y" as (1|Y/X) or (1|Y) + (1|Y:X)
Also, would it be possible in any way to allow variances across
treatment (trt) groups to be unequal? Or is that only possible in
nlme/lme?
(but that wouldn't allow for such complex random effect designs,
right?)