Skip to content

Determining variance components of classed covariates

2 messages · Stephen Montgomery, Gabor Grothendieck

#
Hi -

I am interested in solving variance components for the data below with
respect to the response variable, Expression within R.

However, the covariates aren't independent and they also have a class
(of which the total variance explained by covariates in that class I am
most interested in).

Very naively, I have tried to look at each individual covariates
variance like this
data=input_new)
Linear mixed-effects model fit by REML 
Formula: Expression ~ 1 + (1 | rs11834524) + (1 | rs7074431) 
   Data: input 
   AIC   BIC logLik MLdeviance REMLdeviance
 108.4 116.5 -51.22      102.5        102.4
Random effects:
 Groups     Name        Variance Std.Dev.
 rs11834524 (Intercept) 0.485538 0.69681 
 rs7074431  (Intercept) 0.013720 0.11713 
 Residual               0.128853 0.35896 
number of obs: 109, groups: rs11834524, 3; rs7074431, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)   9.9524     0.4098   24.29

My assumption is that this is telling me that rs11834524 explains
0.485538 of the variance and rs7074431 explains 0.013720 of the variance
in Expression when looked at independently.  

However, I would like to know how to write a model where I know how much
of the total variance (in Expression) is described by covariates
rs11834524, rs1682421, rs13383869 and rs9457141 (call it class A) and
covariates rs9459617, rs7074431, rs12450785, rs592724 (call it class B).
Assuming an additive model within the class.  The caveats are that there
is missing data and again that there may be correlation between all the
covariates.

Such that a theoretical result may be that
Class A: Explains 60% of the total variance in expression (response)
Class B: Explains 10% of the total variance in expression

Thanks for the help!  I am sorry I am R challenged here...I really
appreciate the guidance!

Stephen
input_new <-
structure(list(Individual = structure(1:109, .Label = c("NA06984", 
"NA06985", "NA06986", "NA06989", "NA06993", "NA06994", "NA07000", 
"NA07022", "NA07037", "NA07045", "NA07051", "NA07055", "NA07056", 
"NA07345", "NA07346", "NA07347", "NA07357", "NA07435", "NA11829", 
"NA11830", "NA11831", "NA11832", "NA11839", "NA11840", "NA11843", 
"NA11881", "NA11882", "NA11892", "NA11893", "NA11894", "NA11917", 
"NA11918", "NA11919", "NA11920", "NA11930", "NA11931", "NA11992", 
"NA11993", "NA11994", "NA11995", "NA12003", "NA12005", "NA12006", 
"NA12043", "NA12044", "NA12056", "NA12057", "NA12144", "NA12145", 
"NA12146", "NA12154", "NA12155", "NA12156", "NA12234", "NA12239", 
"NA12248", "NA12249", "NA12264", "NA12272", "NA12273", "NA12274", 
"NA12275", "NA12282", "NA12283", "NA12286", "NA12287", "NA12340", 
"NA12341", "NA12342", "NA12343", "NA12347", "NA12348", "NA12383", 
"NA12399", "NA12400", "NA12414", "NA12489", "NA12546", "NA12716", 
"NA12718", "NA12748", "NA12749", "NA12750", "NA12751", "NA12760", 
"NA12761", "NA12762", "NA12763", "NA12775", "NA12776", "NA12777", 
"NA12778", "NA12812", "NA12813", "NA12814", "NA12815", "NA12827", 
"NA12828", "NA12829", "NA12830", "NA12842", "NA12843", "NA12872", 
"NA12873", "NA12874", "NA12875", "NA12889", "NA12891", "NA12892"
), class = "factor"), Expression = c(9.46026823453575, 10.0788903323991,

9.20330296497174, 10.038741467793, 9.33092349416463, 11.0273957217919, 
10.5498875891745, 9.81137299592747, 11.2023261987976, 9.90559354069027, 
10.1524696609679, 10.3171767665993, 9.02155519577685, 9.84917871051438, 
10.658877473136, 9.88895551011107, 8.62335008726357, 9.21529114100886, 
10.7896248923916, 10.1302992505869, 8.64584282787018, 9.56057795866654, 
9.89810004078774, 10.2557482141576, 8.95588077688637, 9.56452454115857, 
9.26525135092154, 10.5438780642797, 9.8468571349548, 10.7416169225352, 
10.5623721612979, 10.6565276881443, 9.67758493445612, 9.75385553511462, 
8.997797236767, 11.0106882086179, 10.362578597992, 9.2745507212906, 
10.7453355016181, 9.75998268015348, 9.45003620116962, 10.055504292376, 
10.7072220720564, 10.0934686444392, 10.0472832129727, 10.1185615033486, 
10.3340911031131, 9.70618910683157, 10.5953304905529, 10.4246307909547, 
9.91463202635336, 10.249081562168, 10.9252022586474, 10.295544143525, 
11.4838109797985, 10.5286570234792, 9.78692800868132, 10.0397050809162, 
9.27914623343747, 10.37600233389, 9.27341681588134, 9.40195375611303, 
10.8979822929135, 9.03922228977389, 10.3911745662505, 10.4345408213054, 
9.8548491618724, 10.1897729275437, 10.2881888849609, 8.9656977165014, 
9.81595398472166, 10.1856794532084, 9.3763789479684, 10.1712420020647, 
10.2964594680427, 10.3515965292101, 8.94492585275159, 11.2529257614993, 
9.25146912450726, 10.1904309237525, 10.7490591053023, 10.3883924463568, 
10.097023765247, 10.0824730785217, 10.0828512817661, 10.6371064852226, 
10.5831044752098, 10.4484786486601, 8.50264408341596, 10.3468670812262, 
9.46061433005316, 8.90027436167269, 9.73630671555279, 9.40555522408144, 
10.3220768104446, 8.55132985773453, 10.1678182524815, 10.6145417864386, 
10.4169948161073, 10.0253039670548, 10.2568017077865, 10.5045847076951, 
9.75993936712448, 8.99997092895909, 10.6742222414794, 10.8640943324257, 
10.4295384371541, 10.1987862649656, 10.6744617172313), rs11834524 =
structure(c(1L, 
2L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 2L, 1L, 2L, 2L, 2L, 1L, 
1L, 3L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 3L, 3L, 3L, 2L, 
1L, 1L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
2L, 2L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 2L, 
2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 3L, 1L, 2L, 3L, 
2L, 3L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 3L), .Label = c("AA", 
"AG", "GG"), class = "factor"), rs1682421 = structure(c(1L, 2L, 
1L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 3L, 2L, 3L, 1L, 1L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
3L, 1L, 1L, 2L, 3L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 2L, 2L, 1L, 1L, NA, 3L, 2L, 3L, 2L, 2L), .Label = c("CC", 
"CT", "TT"), class = "factor"), rs13383869 = structure(c(2L, 
2L, 2L, 2L, 2L, NA, 2L, 2L, 1L, 2L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 
2L, 3L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, NA, 2L, 2L, 3L, 2L, 
2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 3L, 2L, 
2L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 
1L, 1L, 2L, 2L, NA, 2L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("AA", 
"AG", "GG"), class = "factor"), rs9457141 = structure(c(1L, 2L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 
3L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 1L, 2L, 3L, 2L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, NA, 2L, 1L, 2L, NA, 1L, 
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CC", 
"CT", "TT"), class = "factor"), rs9459617 = structure(c(1L, 2L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 
3L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 1L, 1L, 3L, 1L, 1L, NA, 1L, 3L, 3L, 2L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 2L, 1L, 2L, 2L, 1L, 
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CC", 
"CT", "TT"), class = "factor"), rs7074431 = structure(c(2L, 3L, 
2L, 1L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 2L, 
2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 1L, 1L, 3L, 2L, 1L, 2L, 3L, 2L, 
1L, 2L, 1L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 1L, 2L, 2L, 1L, 
1L, 1L, 3L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("CC", 
"CT", "TT"), class = "factor"), rs12450785 = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 3L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 
3L, 1L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 
2L, 2L, 2L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 
1L, 2L, 2L, 1L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 1L, 3L, 2L, 
2L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 
1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 1L, 
2L, 2L, 3L, 2L, 2L, 1L, 3L, 3L, 3L, 2L, 3L, 2L), .Label = c("AA", 
"AG", "GG"), class = "factor"), rs592724 = structure(c(1L, 2L, 
1L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 2L, 
2L, 2L, 1L, 2L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 3L, 
1L, 3L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
3L, 1L, 3L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 3L, 1L, 3L, 3L, 
2L, 2L, 1L, 1L, 3L, 2L, 2L, 2L, 1L, 3L, 2L, 3L, 1L, 3L, 3L, 2L, 
2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 
1L, 1L, 2L, 1L, 2L, 1L, 2L, 3L, 2L, 2L, 2L), .Label = c("CC", 
"CT", "TT"), class = "factor"), Grp = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1", class =
"factor")), .Names = c("Individual", 
"Expression", "rs11834524", "rs1682421", "rs13383869", "rs9457141", 
"rs9459617", "rs7074431", "rs12450785", "rs592724", "Grp"), row.names =
c(NA, 
-109L), class = "data.frame")



Stephen B. Montgomery
Postdoctoral Researcher, Population and Comparative Genomics
Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA
Skype: stephen.b.montgomery
#
You might want to try the
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
list next time for mixed model questions.

At any rate the Variance column figures are variances, not percentages.

We can use anova with REML=FALSE to make comparisons among
models.  Below we find that removing the rs7074431 term makes very
little difference so we can drop it but removing the rs11834524 term
makes a big difference.  Thus modx0 can be used.
Data: input_new
Models:
modx0: Expression ~ 1 + (1 | rs11834524)
modxx: Expression ~ 1 + (1 | rs11834524) + (1 | rs7074431)
      Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
modx0  3 111.386 119.460 -52.693
modxx  4 110.288 121.053 -51.144 3.0986      1    0.07836 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Data: input_new
Models:
mod0x: Expression ~ 1 + (1 | rs7074431)
modxx: Expression ~ 1 + (1 | rs11834524) + (1 | rs7074431)
      Df      AIC      BIC   logLik  Chisq Chi Df Pr(>Chisq)
mod0x  3  206.652  214.726 -100.326
modxx  4  110.288  121.053  -51.144 98.365      1  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
On Mon, Jan 12, 2009 at 11:36 AM, Stephen Montgomery <sm8 at sanger.ac.uk> wrote: