Prev 2591 / 20628 Next

Differing variable lengths (missing data) and Model errors in lmer()

Fri, Jul 10, 2009 6:31 AM

Dear Aditi,

Would'nt it be simpler then to remove the family effect (by using e.g. 
the residuals of lmer(peg.no~1+(1|Family))) and to look at the effect of 
individual loci on the residuals (I think this is what was suggested by 
Aulchenko etal (Genetics 2007  177:577)? But I may be wrong and missing 
something...

Best wishes,

A Singh wrote:

Dear Jerome,

Thank you very much for your suggestion, and apologies for the late 
response..

I have tried exactly the same approach when I was still going through 
this in my preliminary stages, and I've tried to run lmer() using them 
as fixed effects; however the catch here, as my supervisors explained 
to me is this-
- the alleles are not fixed effects because we don't just want to see 
if they differ significantly in their presence/absence across all 60 
families.

We want to see if they show associations with phenotype-measurements 
within each family so that we can see if they account for the 
unexplained variation within each family, if that makes sense? And 
this makes them random effects.

We are trying to (as a larger goal) look for marker-trait 
associations, map QTL's and  identify fragments of genes responsible 
for speciation/genes under selection.
Nesting the markers within each family is a sort of mini-within-family 
ANOVA with random effects to see whether particular families have a 
phenotypic trait more accounted for by a particular marker/combination 
of markers, than the trait may be accounted for in other families 
based on the same markers.

Working on this principle, I need to run my lmer model they way I was 
trying to, using the for loops to nest markers within families so that 
for each phenotype, I would have (no of families*no of markers) runs..

Hope this makes it clearer...

Thanks again,

Aditi




--On 09 July 2009 21:21 +0200 Jerome Goudet <jerome.goudet at unil.ch> 
wrote:

Dear Aditi,

I'm puzzled with the nesting of loci within families (the presence of an
allele at a locus means the same whatever the family, I would think). I
would treat loci as fixed effect and hence use something like that:

 (fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L11 

4+P1L118+(1|family),dat))
Linear mixed model fit by REML
Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 + 
P1L98 +
P1L100 + P1L114 + P1L118 + (1 | family)
   Data: dat
  AIC  BIC logLik deviance REMLdev
 2954 3006  -1464     2963    2928
Random effects:
 Groups   Name        Variance Std.Dev.
 family   (Intercept) 89.043   9.4363  Residual             91.377
9.5592 Number of obs: 390, groups: family, 57

Fixed effects:
            Estimate Std. Error t value
(Intercept)  96.2793     3.2169  29.929
P1L55        -2.0185     1.5831  -1.275
P1L73        -3.0256     1.8900  -1.601
P1L74         1.5112     1.6189   0.933
P1L77        -2.7898     2.3529  -1.186
P1L91         0.6480     3.3118   0.196
P1L96        -1.4718     1.4938  -0.985
P1L98         2.6431     2.6401   1.001
P1L100       -3.4529     2.0842  -1.657
P1L114        0.8099     1.9218   0.421
P1L118       -2.7119     2.3635  -1.147


But perhaps I missed something?

Best wishes,

A Singh wrote:

Dear All,

I am trying to run a nested random effects model in lmer (for R 2.9.1,
lme4 version 0.999375-31 ) using data which is structured as follows:

family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length
syll.length
1        2      1      0      0      0                86
5.445         2.479
1        3      1      0      0      0                91
5.215         2.356
1        4      0      0      0      0                  79
5.682         2.896
1        5      1      0      0      0               83
5.149         2.641
1        6      0      0      0      0                77
5.044         2.288
1        7      1      0      0      0                  78
5.450         2.420
1        8      1      0      1      1                82
5.377         2.505
1        9      1      0      0      0                95
5.389         2.706
1       10      1      0      0      0                88
5.354         2.461
1       11      1      0      0      0                87
5.262         3.079
1       12      1      0      0      0                84
5.191         2.858
1       13      1      0      0      1                  87
5.194         2.264
2       23      1      0      0      1                  116
5.863         2.876
2       24      1      0      0      0                122
5.475         3.114
2       25      1      0      0      0                110
5.563         3.059
.       .       .   .     .     .               .         .          .
.       .       .   .     .     .               .         .          .
.
.
(60 families)

'Family' is the first Random effect (categorical variable), with 60
levels.

All columns labeled P1L(x) are a matrix of presence/absence genetic
markers for each individual in each family. There are 426 such
columns(not numbered in sequence) and each one is a random effect.

The last three columns (peg.no, ec.length and syll.length) are the
three dependent variables.

Each genetic marker column needs to be nested within each family,
which means that if I take the first phenotype, peg.no, for example,
then I need to run an analysis that partitions variance 60*426 times
(~25,00 runs) for that phenotype.
I also need an output with the Anova table, and summary, at each stage
of the run, so that I can get P values for each marker for each
family, as a way of determining whether it contributes significantly
to explaining within-family variance for that trait.

To do the above, I tried to write code for lmer() using two nested
'for' loops, one for each level of random factor nesting (marker
within family) as follows (using a test data set [please find
attached] with only the first 10 marker columns, to see if this works):

vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
attach(vc)

family<-factor(family)
colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker

data    to a new variable, as column names are themselves not in any
recognisable sequence

vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms)
library(lme4)

for (c in levels(family))

+ {    for (i in 1:length(colms))
+        { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
+        }
+    summ<-summary(fit)
+    av<-anova(fit)
+    print(summ)
+    print(av)
+ }

This gives me:

Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 
+  :
 variable lengths differ (found for 'c')



On suggestion from a colleague I reframed this as:

for(c in levels(family))

+ {
+ print("----New C:----")
+ print(c)
+ for (i in 1:length(colms))
+ {
+ fit<-lmer(peg.no~1+ (1|c/i),vcdf)
+ print(i)
+ summ<-summary(fit)
+ av<-anova(fit)
+ print(summ)
+ print(av)
+ }
+ }

..and this gave me the output:

[1] "----New C:----"
[1] "1"
Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 
+  :
 variable lengths differ (found for 'c')


Google-ing the error message has led me to plenty of links that
suggest forcing data into a data frame to fix this, but that hasn't
worked.
My markers and phenotypes both have plenty of missing data (NA's in
the data.frame), and na.action=na.omit isn't solving the problem.

(I have tried this with lme, and tried to do it with the aov() command
as well and the error is pretty much the same).

I am completely new to R, and despite searching and trying various
things, can't get the code to work.

I really appreciate any corrections to this code, or alternative
command/function suggestions that I can look into, to try to do this
again.


Thanks a bunch for your help,

Aditi


----------------------
A Singh
Aditi.Singh at bristol.ac.uk
School of Biological Sciences
University of Bristol



------------------------------------------------------------------------

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

J?r?me Goudet
Dept. Ecology & Evolution
UNIL-Sorge, CH-1015 Lausanne

mail:jerome.goudet at unil.ch
Tel:+41 (0)21 692 4242
Fax:+41 (0)21 692 4265

Thread (5 messages)

A Singh Differing variable lengths (missing data) and Model errors in lmer() Jul 9 Jerome Goudet Differing variable lengths (missing data) and Model errors in lmer() Jul 9 A Singh Differing variable lengths (missing data) and Model errors in lmer() Jul 10 Jerome Goudet Differing variable lengths (missing data) and Model errors in lmer() Jul 10 A Singh Differing variable lengths (missing data) and Model errors in lmer() Jul 10