need help with mixed effects model
On 01/03/2008, at 4:29 AM, Mark W Kimpel wrote:
Doug and other mixed-models aficionados, I have made some progress on my own on the problem I posted in this thread. Briefly, I am analyzing a multifactoral genomic experiment and wish to look at gene-gene correlations independent of Strain. Because multiple measurements are taken per rat, I wish to use lmer. What seems to be working is the following. mod1 <- lmer(gene2 ~ -1 + Strain + (1|Rat) + gene1) mod2 <- lmer(gene2 ~ -1 + Strain + (1|Rat)) anova.sum <- anova(mod1, mod2) I look to see if adding the expression of the other gene of interest as a covariate significantly improves the model, if it does, then I take that as an indicator of gene-gene correlation/dependence.
The concern that Doug had is I assume that gene1 and gene2 are both measured with error, and this type of model assumes that the covariates are measured without error or for practical purposes much lower than the error in the dependent variable. Ignoring this problem biases the coefficients towards zero with consequent loss of power. I don't have any idea how important this is, it all depends on the error of your measurements. The usual solution is structural equation modelling (SEM). This is something I haven't tried, so I have no idea how easy or how well it will work. Ken
I am not doing this, of course for just two genes, but build an adjacency matrix out of the p-values for all gene-gene interactions in a list of about 400 sig. genes. I then adjust the p-values for FDR and pick a suitable FDR (0.001 in this case) as a threshold and create another adjacency matrix with 1's for significant correlation and 0's for non-significant. I then visualize this using Rgraphviz. As I was tearing my hair out trying to make sure this was sensical, it occurred to me that within my list of 400 genes I have positive controls. About 40 of the genes are represented by 2 or more probesets, which should be highly correlated if they are measuring the same thing. So, I subjected just genes with duplicate probesets to the above procedure and, sure enough, in an overwhelming number of cases, probesets from the same gene plot next to each other. My conclusion from this exercise is that what I am doing is empirically correct, although I am open to suggestions as to how it could be improved or comments as to how I may be just plain wrong. Doug, I am reading your book and appreciate your contributions. Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 204-4202 Home (no voice mail please) mwkimpel<at>gmail<dot>com ****************************************************************** Douglas Bates wrote:
On Fri, Feb 22, 2008 at 11:57 AM, Mark W Kimpel <mwkimpel at gmail.com> wrote:
This is my first foray into in mixed models and, while awaiting the arrival of:
Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models Mixed Effects Models in S and S-Plus
I am in need to some advice.
I would like to look at gene-gene correlations within a multi- factorial, mixed effects experiment. Here are the factors, with levels:
Gene Expression: 2 different genes per Animal, continuous variable Animals: 6 per Strain Tissues: 3 per animal
Strain: 2
I thus have 6*3*2 = 36 samples
I do not care, for this analysis, about differences between Tissues, Strains, or Animals, in fact, I want to control for them while examining the correlation of expression of the two genes. In other words, I want look at something very much like the Pearson correlation coefficient controlled for these other factors.
I guess the first question I should ask is: "is a mixed model the way to go, and, if not, what would be the correct approach?"
Perhaps. How do you plan to incorporate the two genes?
Assuming mixed models will work, as I see it through my newbie eyes, Tissue and strain are fixed effects and animals are random effects.
If you were interested in just 1 gene than I would say that this looks like a good approach. I'm just not sure what to do about the multiple genes.
Any suggestions for an approach and a model?
The model specification (assuming that each animal has a distinct number) would be something like gene1 ~ Tissue * Strain + (1|Animal) In your earlier message to the Bioconductor list you had a specification that looked like gene1 ~ gene2 + ... which makes me a little queasy because you are assuming that gene2 is "known" relative to the variability in gene1 and most of the time that is not a reasonable approach.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models