Differing variable lengths (missing data) and Model errors in lmer()
Dear Aditi,
I'm puzzled with the nesting of loci within families (the presence of an
allele at a locus means the same whatever the family, I would think). I
would treat loci as fixed effect and hence use something like that:
(fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L114+P1L118+(1|family),dat))
Linear mixed model fit by REML
Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 + P1L98
+ P1L100 + P1L114 + P1L118 + (1 | family)
Data: dat
AIC BIC logLik deviance REMLdev
2954 3006 -1464 2963 2928
Random effects:
Groups Name Variance Std.Dev.
family (Intercept) 89.043 9.4363
Residual 91.377 9.5592
Number of obs: 390, groups: family, 57
Fixed effects:
Estimate Std. Error t value
(Intercept) 96.2793 3.2169 29.929
P1L55 -2.0185 1.5831 -1.275
P1L73 -3.0256 1.8900 -1.601
P1L74 1.5112 1.6189 0.933
P1L77 -2.7898 2.3529 -1.186
P1L91 0.6480 3.3118 0.196
P1L96 -1.4718 1.4938 -0.985
P1L98 2.6431 2.6401 1.001
P1L100 -3.4529 2.0842 -1.657
P1L114 0.8099 1.9218 0.421
P1L118 -2.7119 2.3635 -1.147
But perhaps I missed something?
Best wishes,
A Singh wrote:
Dear All, I am trying to run a nested random effects model in lmer (for R 2.9.1, lme4 version 0.999375-31 ) using data which is structured as follows: family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length syll.length 1 2 1 0 0 0 86 5.445 2.479 1 3 1 0 0 0 91 5.215 2.356 1 4 0 0 0 0 79 5.682 2.896 1 5 1 0 0 0 83 5.149 2.641 1 6 0 0 0 0 77 5.044 2.288 1 7 1 0 0 0 78 5.450 2.420 1 8 1 0 1 1 82 5.377 2.505 1 9 1 0 0 0 95 5.389 2.706 1 10 1 0 0 0 88 5.354 2.461 1 11 1 0 0 0 87 5.262 3.079 1 12 1 0 0 0 84 5.191 2.858 1 13 1 0 0 1 87 5.194 2.264 2 23 1 0 0 1 116 5.863 2.876 2 24 1 0 0 0 122 5.475 3.114 2 25 1 0 0 0 110 5.563 3.059 . . . . . . . . . . . . . . . . . . . . (60 families) 'Family' is the first Random effect (categorical variable), with 60 levels. All columns labeled P1L(x) are a matrix of presence/absence genetic markers for each individual in each family. There are 426 such columns(not numbered in sequence) and each one is a random effect. The last three columns (peg.no, ec.length and syll.length) are the three dependent variables. Each genetic marker column needs to be nested within each family, which means that if I take the first phenotype, peg.no, for example, then I need to run an analysis that partitions variance 60*426 times (~25,00 runs) for that phenotype. I also need an output with the Anova table, and summary, at each stage of the run, so that I can get P values for each marker for each family, as a way of determining whether it contributes significantly to explaining within-family variance for that trait. To do the above, I tried to write code for lmer() using two nested 'for' loops, one for each level of random factor nesting (marker within family) as follows (using a test data set [please find attached] with only the first 10 marker columns, to see if this works):
vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
attach(vc)
family<-factor(family) colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
data to a new variable, as column names are themselves not in any recognisable sequence
vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms) library(lme4)
for (c in levels(family))
+ { for (i in 1:length(colms))
+ { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
+ }
+ summ<-summary(fit)
+ av<-anova(fit)
+ print(summ)
+ print(av)
+ }
This gives me:
Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 + :
variable lengths differ (found for 'c')
On suggestion from a colleague I reframed this as:
for(c in levels(family))
+ {
+ print("----New C:----")
+ print(c)
+ for (i in 1:length(colms))
+ {
+ fit<-lmer(peg.no~1+ (1|c/i),vcdf)
+ print(i)
+ summ<-summary(fit)
+ av<-anova(fit)
+ print(summ)
+ print(av)
+ }
+ }
..and this gave me the output:
[1] "----New C:----"
[1] "1"
Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 + :
variable lengths differ (found for 'c')
Google-ing the error message has led me to plenty of links that
suggest forcing data into a data frame to fix this, but that hasn't
worked.
My markers and phenotypes both have plenty of missing data (NA's in
the data.frame), and na.action=na.omit isn't solving the problem.
(I have tried this with lme, and tried to do it with the aov() command
as well and the error is pretty much the same).
I am completely new to R, and despite searching and trying various
things, can't get the code to work.
I really appreciate any corrections to this code, or alternative
command/function suggestions that I can look into, to try to do this
again.
Thanks a bunch for your help,
Aditi
----------------------
A Singh
Aditi.Singh at bristol.ac.uk
School of Biological Sciences
University of Bristol
------------------------------------------------------------------------
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
J?r?me Goudet Dept. Ecology & Evolution UNIL-Sorge, CH-1015 Lausanne mail:jerome.goudet at unil.ch Tel:+41 (0)21 692 4242 Fax:+41 (0)21 692 4265