R-sig-mixed-models Digest, Vol 136, Issue 41

Tue, May 8, 2018 2:06 PM

Hi, Rune,
Thank you, a lot, for answering so fast to my question (R-sig-mixed-models Digest, Vol 136, Issue 41). These days I?ve been reading the mails that you have been sending about how to investigate the interaction between random and fixed effects and how to estimate the variance components in these models.
To put the question and examples in context, I want to briefly comment the research subject of my laboratory. The working line that is developed here study different characters (morphological, physiological and behavioral) in several species of Drosophila. Because of its easy way of maintenance, Drosophila is perfect to generate isolines and have an accurate estimate of genetics components. For example, one of the working lines involucrates the study of the thorax or wing length (like estimators of body size) in fly raised at different temperatures for diverse genetic lines. One of the principal interests is the study of the interaction between the fixed effects variables (i. e. Temperature) and the random effect variable (Line) to study the genotype-phenotype interaction. In this sense, to further compare different populations we are interested in estimate the variance components (by percentages of variance explained by Line and the interaction Line*Temperature, qualitatively).

As you said ??Model1 can be fitted if X1 is a factor. In that case Model1 is rather complex and the number of parameters grows rapidly with the number of levels of X1 - in comparison the random term in Model2 uses just one (variance) parameter?, in the first place I am going to simplify the model to try to better understand.
So, as the first option, I decided to try a model like the Model 1:
mt <-lmer(Torax ~Temperature+Sex+ (Temperature|Line), data)
Random effects:
Groups Name Variance Std.Dev. Corr
Line (Intercept) 49.27 7.019
Temperature25 52.80 7.266 -0.55
Residual 19.21 4.383
Number of obs: 780, groups: Linea, 49

In addition, trying to resume my interest in knowing which is the variance explained by the interaction Line*Temperature and by Line, I?ve arrived to these models:
mtb <- lmer(Torax ~Temperature +Sex+ (0+Temperature|Line), data) AIC=4814,43
Random effects:
Groups Name Variance Std.Dev. Corr
Line Temperature17 49.27 7.019
Temperature25 46.38 6.811 0.45
Residual 19.21 4.383
Number of obs: 780, groups: Linea, 49

mtb2 <- lmer(Torax ~Temperature +Sex+ (Temperature||Line), data) AIC=4816,43
Random effects:
Groups Name Variance Std.Dev. Corr
Line (Intercept) 22.41 4.734
Line1 Temperature17 26.86 5.182
Temperature25 23.97 4.896 -0.04
Residual 19.21 4.383
Number of obs: 780, groups: Linea, 49

df      AIC
mt    7 4814.428
mtb   7 4814.428
mtb2  8 4816.428

They have the same fixed effects, but the difference resides in how they expressed the variance components. So, if I?m interested in obtain the whole variance components (Line and Line*Temperature), do I need to run the mtb2 model? The other models aren?t useful? The difference between the three models is how we interpret and how they show us the variance?
On one hand, the difference between mtb and mtb2 is the existence of one parameter of variance related to Line? (If I want to estimate the variance for each component it will be Line=22.41 and Line:Temperature(26.86+23.97) or this last sum doesn?t make sense?)
It's very difficult for me to understand how to interpret the variance components in the models mt and mtb. In addition, in your first answer to Jung, you said ?First, '(recipe || replicate)' is the same as/expands to '(1 |replicate) + (0 + recipe | replicate)' which is just an over-parameterized version of '(0 + recipe | replicate)', which again is a re-parameterized version of '(recipe | replicate)'. These are all representing the same model (all on 10 df though lmer() is mislead and thinks that m_zcp has 11 df)??  I don?t understand why you said that the model (recipe || replicate) is over-parameterized, if it has the same parameters that (0 + recipe | replicate). Even if I notice that in the variance components I have different amounts of estimated variances, however I would not be seeing that this has an impact on the model worsening due to excess of estimated parameters. I do not know if I explain myself?
On the other hand, going back to your initial advice ?The point here is that we need to also consider Model3b as an alternative to Model1 and Model2. Of these models, Model3 _and_ Model2 are the simplest while Model1 is the most complex with Model3b in the middle often serving as an appropriate compromise..?  I run a model as
mt3 <- lmer(Torax ~ Sex +Temperature +(1|Line:Temperature),
AIC=4820.112
Random effects:
 Groups            Name        Variance Std.Dev.
 Line:Temperature (Intercept)  47.81    6.915
 Residual                       19.21    4.383
Number of obs: 780, groups:  Line:Temperature, 98

However, this syntax doesn?t let me see all variance components (doesn?t show Line). So, I decided to run another syntax.
mt3b <- lmer(Torax ~Sex +Temperature+ (1 | Line) +(1 | Temperature:Line),data)
Random effects:
Groups                       Name       Variance Std.Dev.
Temperature:Line  (Intercept) 26.40    5.138
Line                           (Intercept)  21.42    4.628
Residual                                         19.21    4.383
Number of obs: 780, groups:  Temperature:Line, 98; Linea, 49

If I compare all models:
           df      AIC
mt      7 4814.428
mtb    7 4814.428
mtb2  8 4816.428
mt3     5 4820.112
mt3b  6 4812.476

and only I focus in those that show me all variance components (Line and Line*Temperature, mtb2 and mt3b), can I choose the best model in function of AIC?
Going back to the general objective of estimate the variance components and evaluate the interaction between the random variables, my question is: the difference between the last models is that mtb2 is estimating 2 parameters more which correspond, on one hand, to the discrimination of the interaction?s variance of Line*Temperature in two components (Line*Temperature17 and Line*Temperature25) and on the other hand, to the correlation between those components. If this is correct, according to my objective, is more parsimonious mt3b? In adittion, the sum of the variance explained for the interaction Line*Temperature17 and Line*Temperature25 (if this make any sense) shouldn?t be similar or identical to the variance explained for Temperature:Line? Because if I compare those values I don?t see that happen.

Finally, in your last mail you enumerate different models that can be compared. I don?t understand the difference between the model fm5 and the model fm6c, I don?t know if you can clarify me a bit.

Thank you so much again for your patience and dedication.
Cheers,
Nicol?s.

De: Rune Haubo <rune.haubo at gmail.com>

Enviado: s?bado, 28 de abril de 2018 06:32
Para: Nicolas Flaibani
Cc: r-sig-mixed-models at r-project.org
Asunto: Re: [R-sig-ME] R-sig-mixed-models Digest, Vol 136, Issue 41

On 27 April 2018 at 22:33, Nicolas Flaibani <n.flaiba at hotmail.com> wrote:
>
>  Hi everyone!
>
>
> I would like to join the discussion of the second topic (Specifying Multiple Random Effects in NLME by Dan) but in my case I?m going to talk about the lme4 package, instead of the nmle. I think that the question is still relevant.
>
> I?ve had a doubt for a long time about how to investigate the interactions between random and fixed effects. I?ve read a lot of forums, papers and help?s packages and I?ve always concluded that the correct form of testing the interaction between a random and a fixed variable was:
>
>
> Model1 <- lmer (Y ~ X1 + X2 + (X1 | Random Variable)
>
>
> However, I found in some forums and personal communications from several researchers that there is another way to investigate the interaction between random and fixed variables and has the following syntax:
>
>
> Model2 <- lmer (Y ~ X1 + X2 + (1 | Random Variable: X1)
>
The 'classical' (think DOE and 'variance-components') interaction is
Model2 if X1 is categorical/factor and Model1 if X1 is continuous
(then Model1 is sometimes called the 'random coefficient model').

If X1 is continuous Model2 doesn't make sense - on the other hand
Model1 can be fitted if X1 is a factor. In that case Model1 is rather
complex and the number of parameters grows rapidly with the number of
levels of X1 - in comparison the random term in Model2 uses just one
(variance) parameter. While some people often favour 'Model1 with
factor X1'- construction, I often think it is difficult to explain
what this model actually means; it is also very often overfitting
since it requires a lot of data and special data-generating mechanism
to support models of that form.

>
> I understand that this syntax
>
>
> (1|Random Variable/X1) = (1|Random Variable)+(1|Random Variable:X1)
>
>
> specify a nested structure between the variables and this is not the case of interest.

This construction serves two purposes: one is when the random-effect
variables have a 'truly' nested structure such as pupils in classes
(in schools, in districts, etc). The other purpose often applies to
designed experiments where you might have machines (fixed) and
operators (random). The main effects model is then

Model3 <- lmer(Y ~ machines + (1 | operators))

and a model that includes the interaction reads

Model3b <- lmer(Y ~ machines + (1 | operators) + (1 | machines:operators))

Technically the 'machines:operators' combinations are nested in
'operators' but we usually don't think of it that way.

The point here is that we need to also consider Model3b as an
alternative to Model1 and Model2. Of these models, Model3 _and_ Model2
are the simplest while Model1 is the most complex with Model3b in the
middle often serving as an appropriate compromise.

>
> My particular question is whether the syntax of the Model 2 is correct to test interactions between random and fixed variables. If this model is correct, which are the differences with the syntax of Model 1, since the resulting models are clearly different? Besides, in coincidence with the question of Dan (?Are there cases where one vs. the other formulation should absolutely be used? My understanding that for continuous variables, e.g., multiple measurements across multiple days, Days|Object would be the correct syntax. But here we're talking about a factor variable?), I ask if one type of syntax should be used if the fixed variables are continuous or there are factors.
>
> If I compare the summary from a model with the latter syntax (model 2), with the summary of the same analysis made with a statistic program (like Statistica), the results are very similar. That?s not the case with the model 1.
>
> For example, if I analyze a morphological trait with the syntax
>
>
> M2 <- lmer (Wing ~ Temperature * Sex + Temperature + Sex + (1 | Line) +   (1 | Line:Sex:Temperature) + (1 | Line:Sex) + (1 | Line:Temperature))
>
>
> the summary is the following:
>
>
> Random effects:
>
>  Groups                               Name        Variance Std.Dev.
>
>  Line:Sex:Temperature  (Intercept)  14.6231  3.8240
>
>  Line:Temperature          (Intercept) 154.7685 12.4406
>
>  Line:Sex                         (Intercept)   0.6947  0.8335
>
>  Line                                 (Intercept)  72.5945  8.5202
>
>  Residual                                             180.0664 13.4189
>
>
> Fixed effects:
>
>                     Estimate Std. Error      df t value                         Pr(>|t|)
>
> (Intercept)                  501.141      2.268  96.940 221.009       < 2e-16 ***
>
> Temperature25        -57.960      2.699  54.800 -21.473         < 2e-16 ***
>
> SexM                         -53.639      1.001  96.260 -53.584         < 2e-16 ***
>
> Temperature25:SexM   -6.488      1.391  48.300          -4.663 2.49e-05 ***
>
>
> I found that the function rand() from the lmerTest package gives me the p values of the random effects if I write the model like this:
>
>> rand(M2)
>
> Analysis of Random effects Table:
>
>
>                          Chi.sq Chi.DF p.value
>
> Line                    4.6152      1    0.03 *
>
> Line:Sex:Temperature  30.8130      1   3e-08 ***
>
> Line:Sex                             0.0391      1    0.84
>
> Line:Temperature          112.1539      1  <2e-16 ***
>
>
> I don?t know if this function is reliable because it is not mentioned for testing the significance of the random effects in the page https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
GLMM FAQ - GitHub Pages<https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html>
bbolker.github.io
Introduction. This is an informal FAQ list for the r-sig-mixed-models mailing list.. The most commonly used functions for mixed modeling in R are. linear mixed models: aov(), nlme::lme 1, lme4::lmer;




Having written rand (=ranova) my views may be biased but I should say
that it _is_ reliable. That is, I haven't seen cases where it's not
doing what was intended ;-) rand() and ranova() simply compares two
models using anova(m1, m2, refit=FALSE) where m2 differs from m1 by
having one of its random-effect terms reduced or removed.

>
> The summary of the same analysis made with the Statistica is:
>
>
>
>
> Effect
>
> SS
>
> Degr. of
>
> MS
>
> Den.Syn.
>
> Den.Syn.
>
> F
>
> p
>
> Intercept
>
> Fixed
>
> 764579151
>
> 1
>
> 764579151
>
> 48,001
>
> 12655,91
>
> 60412,83
>
> 0,000000
>
> Line
>
> Random
>
> 608181
>
> 48
>
> 12670
>
> 47,800
>
> 6489,64
>
> 1,95
>
> 0,011254
>
> Sex
>
> Fixed
>
> 3138661
>
> 1
>
> 3138661
>
> 48,038
>
> 495,62
>
> 6332,81
>
> 0,000000
>
> Temperature
>
> Fixed
>
> 3686660
>
> 1
>
> 3686660
>
> 48,003
>
> 6459,62
>
> 570,72
>
> 0,000000
>
> Line*Sex
>
> Random
>
> 23808
>
> 48
>
> 496
>
> 48,000
>
> 473,30
>
> 1,05
>
> 0,435866
>
> Line*Temperature
>
> Random
>
> 310413
>
> 48
>
> 6467
>
> 48,000
>
> 473,30
>
> 13,66
>
> 0,000000
>
> Sex*Temperature
>
> Fixed
>
> 10075
>
> 1
>
> 10075
>
> 48,040
>
> 472,94
>
> 21,30
>
> 0,000029
>
> Line*Sex*Temperature
>
> Random
>
> 22718
>
> 48
>
> 473
>
> 3696,000
>
> 167,33
>
> 2,83
>
> 0,000000
>
> Error
>
> 618467
>
> 3696
>
> 167
>
>
>
>
> But if I write the model with the other syntax:
>
> M1 <- lmer(Wing ~ Temperature * Sex + (Temperature * Sex | Line))
>
Writing the model as
M1 <- lmer(Wing ~ Temperature * Sex + (0 + Temperature:Sex | Line))
is often preferable as it makes the random-effect variance-covariance
matrix easier to interpret.

>
>
> the summary is the following:
>
>
>
> REML criterion at convergence: 31440.9
>
>
>
> Random effects:
>
>  Groups   Name                Variance Std.Dev. Corr
>
>  Line    (Intercept)           266.78   16.333
>
>   Temperature25             398.27   19.957   -0.60
>
>    SexM                                41.54    6.446   -0.56  0.46
>
>   Temperature25:SexM  61.34    7.832    0.56 -0.61 -0.80
>
>  Residual                             167.33   12.936
>
>
>
> Fixed effects:
>
>                                              Estimate Std. Error         df t value             Pr(>|t|)
>
> (Intercept)                          501.603      2.371            48.046 211.586  < 2e-16 ***
>
> Temperature25               -58.423      2.911               48.027 -20.070  < 2e-16 ***
>
> SexM                                 -53.659      1.095              47.964 -49.023  < 2e-16 ***
>
> Temperature                    25:SexM   -6.470               1.393  48.278  -4.644 2.66e-05 ***
>
> ---
>
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
>
>
> In addition, if I apply the ?rand function? for this syntax (M1), it doesn?t retourn the whole p-values of the random effects (Do not give me p value for line and line*Temperature*Sex)

Use lmerTest::ranova(M1, reduce.terms=FALSE) to achieve a test for the
entire random-effect term (and make sure you have lmerTest version >=
3.0-0 installed).

>
> Analysis of Random effects Table:
>
>                                              Chi.sq Chi.DF p.value
>
> Temperatura:L?nea 0.00e+00      0       1
>
> Sexo:L?nea             1.46e-10      0  <2e-16 ***
>
>
> I really appreciate your time and dedication for answering this questions. Thank you for trying to help us understand a little more about the syntax of these complex models and thus better understand their correct approach.

You are not alone if you think this is complicated. My students are
for the most part challenged in simply writing up the mathematical
formula that correctly represents models such as Model1 -
transitioning from scalar random-effect terms (e.g. Model3b) to
vector-valued random-effect terms (e.g. Model1) often takes time and
dedication.

Cheers
Rune

>
> Thank you very much for your time everyone.
>
>
>
> Greetings,
>
> Nicolas
>
>
> ----------------------------------------------------------------------------------------
> Message: 2
> Date: Wed, 25 Apr 2018 16:11:38 -0400
> From: Dan <ieshan at gmail.com>
> To: "R-SIG-Mixed-Models at R-project.org"
>         <R-sig-mixed-models at r-project.org>, Ben Bolker <bbolker at gmail.com>
> Subject: [R-sig-ME] Specifying Multiple Random Effects in NLME
> Message-ID:
>         <CAET4i1f-SH5zA6xcCxNXc091HCk+snMv+rFm0tf995yYukiCOw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all:
>
> I am curating an off-list thread about specifying multiple random effects
> in NLME.
>
> 1.  If it's (1|Object) + (1|Object:Coating) that you want then
> you should be able to use a nested specification (which nlme *can*
> handle relatively easily), i.e. something like
>
> random=a+b+c~1|Object/Coating
>
>
> Although (Coating|Object) and (1|Object:Coating) both in some sense
> represent "interactions" the latter is *much* simpler/more parsimonious.
>
> If you're OK with 1|Object:Coating rather than Coating|Object it
> should be *much* faster.  If you don't understand the distinction (which
> would be absolutely fine and understandable) can we resume the
> discussion on r-sig-mixed-models ... ?
> -----------
>
> So:
> random=a+b+c~Coating|Object
> does not fit.
>
> But:
> random=a+b+c~Object/Coating
> fits.
>
> Can you better explain the distinction here? I have sometimes used the
> 1|Object:Coating + 1|Object syntax and sometimes the Coating|Object syntax
> in other models. My experience/understanding is that the former syntax with
> multiple "within subject" variables produces exactly matching output to the
> standard "repeated measures ANOVA" with the lmer assumption of compound
> symmetry.
>
> Are there cases where one vs. the other formulation should absolutely be
> used? My understanding that for continuous variables, e.g., multiple
> measurements across multiple days, Days|Object would be the correct syntax.
> But here we're talking about a factor variable.
>
>
> 2.   I'm trying to read the "random" section for nlme right now but it's
> kind of making my head explode (and I think there's a typo: " the same
> as the order of the order of the elements in the list").  It *sounds*
> like (1) explicitly creating an interaction
> ObjCoating=interaction(Object,Coating) and (2) using something like
>
>   list(ObjCoating=a~1,Object=b~1,Object=c~1)
>
> should work (grouping factors as names, then [right-hand-side variable
> name]~[random effects model], but I'm worried about the phrase "The
> order of nesting will be assumed the same as the order of the elements
> in the list": what nesting?
> -----------
>
> I think that formulation is explicitly in order. I replaced your first
> ObjCoating with simply Object, just to test what would happen:
>
> Random effects:
>  Formula: a ~ 1 | Object
>         a.(Intercept)
> StdDev:      1.305816
>
>  Formula: b ~ 1 | Object %in% Object
>         b.(Intercept)
> StdDev:    0.01576521
>
>  Formula: c ~ 1 | Object %in% Object %in% Object
>         c.(Intercept) Residual
> StdDev:      2.677883 2.219676
>
>         [[alternative HTML version deleted]]
>
>
>
>
> *****
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>