Differing variable lengths (missing data) and Model errors in lmer()
Dear Aditi, Would'nt it be simpler then to remove the family effect (by using e.g. the residuals of lmer(peg.no~1+(1|Family))) and to look at the effect of individual loci on the residuals (I think this is what was suggested by Aulchenko etal (Genetics 2007 177:577)? But I may be wrong and missing something... Best wishes,
A Singh wrote:
Dear Jerome, Thank you very much for your suggestion, and apologies for the late response.. I have tried exactly the same approach when I was still going through this in my preliminary stages, and I've tried to run lmer() using them as fixed effects; however the catch here, as my supervisors explained to me is this- - the alleles are not fixed effects because we don't just want to see if they differ significantly in their presence/absence across all 60 families. We want to see if they show associations with phenotype-measurements within each family so that we can see if they account for the unexplained variation within each family, if that makes sense? And this makes them random effects. We are trying to (as a larger goal) look for marker-trait associations, map QTL's and identify fragments of genes responsible for speciation/genes under selection. Nesting the markers within each family is a sort of mini-within-family ANOVA with random effects to see whether particular families have a phenotypic trait more accounted for by a particular marker/combination of markers, than the trait may be accounted for in other families based on the same markers. Working on this principle, I need to run my lmer model they way I was trying to, using the for loops to nest markers within families so that for each phenotype, I would have (no of families*no of markers) runs.. Hope this makes it clearer... Thanks again, Aditi --On 09 July 2009 21:21 +0200 Jerome Goudet <jerome.goudet at unil.ch> wrote:
Dear Aditi,
I'm puzzled with the nesting of loci within families (the presence of an
allele at a locus means the same whatever the family, I would think). I
would treat loci as fixed effect and hence use something like that:
(fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L11
4+P1L118+(1|family),dat))
Linear mixed model fit by REML
Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 +
P1L98 +
P1L100 + P1L114 + P1L118 + (1 | family)
Data: dat
AIC BIC logLik deviance REMLdev
2954 3006 -1464 2963 2928
Random effects:
Groups Name Variance Std.Dev.
family (Intercept) 89.043 9.4363 Residual 91.377
9.5592 Number of obs: 390, groups: family, 57
Fixed effects:
Estimate Std. Error t value
(Intercept) 96.2793 3.2169 29.929
P1L55 -2.0185 1.5831 -1.275
P1L73 -3.0256 1.8900 -1.601
P1L74 1.5112 1.6189 0.933
P1L77 -2.7898 2.3529 -1.186
P1L91 0.6480 3.3118 0.196
P1L96 -1.4718 1.4938 -0.985
P1L98 2.6431 2.6401 1.001
P1L100 -3.4529 2.0842 -1.657
P1L114 0.8099 1.9218 0.421
P1L118 -2.7119 2.3635 -1.147
But perhaps I missed something?
Best wishes,
A Singh wrote:
Dear All, I am trying to run a nested random effects model in lmer (for R 2.9.1, lme4 version 0.999375-31 ) using data which is structured as follows: family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length syll.length 1 2 1 0 0 0 86 5.445 2.479 1 3 1 0 0 0 91 5.215 2.356 1 4 0 0 0 0 79 5.682 2.896 1 5 1 0 0 0 83 5.149 2.641 1 6 0 0 0 0 77 5.044 2.288 1 7 1 0 0 0 78 5.450 2.420 1 8 1 0 1 1 82 5.377 2.505 1 9 1 0 0 0 95 5.389 2.706 1 10 1 0 0 0 88 5.354 2.461 1 11 1 0 0 0 87 5.262 3.079 1 12 1 0 0 0 84 5.191 2.858 1 13 1 0 0 1 87 5.194 2.264 2 23 1 0 0 1 116 5.863 2.876 2 24 1 0 0 0 122 5.475 3.114 2 25 1 0 0 0 110 5.563 3.059 . . . . . . . . . . . . . . . . . . . . (60 families) 'Family' is the first Random effect (categorical variable), with 60 levels. All columns labeled P1L(x) are a matrix of presence/absence genetic markers for each individual in each family. There are 426 such columns(not numbered in sequence) and each one is a random effect. The last three columns (peg.no, ec.length and syll.length) are the three dependent variables. Each genetic marker column needs to be nested within each family, which means that if I take the first phenotype, peg.no, for example, then I need to run an analysis that partitions variance 60*426 times (~25,00 runs) for that phenotype. I also need an output with the Anova table, and summary, at each stage of the run, so that I can get P values for each marker for each family, as a way of determining whether it contributes significantly to explaining within-family variance for that trait. To do the above, I tried to write code for lmer() using two nested 'for' loops, one for each level of random factor nesting (marker within family) as follows (using a test data set [please find attached] with only the first 10 marker columns, to see if this works):
vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
attach(vc)
family<-factor(family) colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
data to a new variable, as column names are themselves not in any recognisable sequence
vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms) library(lme4)
for (c in levels(family))
+ { for (i in 1:length(colms))
+ { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
+ }
+ summ<-summary(fit)
+ av<-anova(fit)
+ print(summ)
+ print(av)
+ }
This gives me:
Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1
+ :
variable lengths differ (found for 'c')
On suggestion from a colleague I reframed this as:
for(c in levels(family))
+ {
+ print("----New C:----")
+ print(c)
+ for (i in 1:length(colms))
+ {
+ fit<-lmer(peg.no~1+ (1|c/i),vcdf)
+ print(i)
+ summ<-summary(fit)
+ av<-anova(fit)
+ print(summ)
+ print(av)
+ }
+ }
..and this gave me the output:
[1] "----New C:----"
[1] "1"
Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1
+ :
variable lengths differ (found for 'c')
Google-ing the error message has led me to plenty of links that
suggest forcing data into a data frame to fix this, but that hasn't
worked.
My markers and phenotypes both have plenty of missing data (NA's in
the data.frame), and na.action=na.omit isn't solving the problem.
(I have tried this with lme, and tried to do it with the aov() command
as well and the error is pretty much the same).
I am completely new to R, and despite searching and trying various
things, can't get the code to work.
I really appreciate any corrections to this code, or alternative
command/function suggestions that I can look into, to try to do this
again.
Thanks a bunch for your help,
Aditi
----------------------
A Singh
Aditi.Singh at bristol.ac.uk
School of Biological Sciences
University of Bristol
------------------------------------------------------------------------
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- J?r?me Goudet Dept. Ecology & Evolution UNIL-Sorge, CH-1015 Lausanne mail:jerome.goudet at unil.ch Tel:+41 (0)21 692 4242 Fax:+41 (0)21 692 4265
---------------------- A Singh Aditi.Singh at bristol.ac.uk School of Biological Sciences University of Bristol
J?r?me Goudet Dept. Ecology & Evolution UNIL-Sorge, CH-1015 Lausanne mail:jerome.goudet at unil.ch Tel:+41 (0)21 692 4242 Fax:+41 (0)21 692 4265