Hi experts, While the slope is coming out to be identical in the two methods below, the intercepts are not. As far as I understand, both are formulations are identical in the sense that these are asking for a slope corresponding to 'Days' and a separate intercept term for each Subject. # Model-1 library(lmer) coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy)) # Model-2 coef(lm(Reaction ~ Days + Subject, sleepstudy)) Can somebody tell me the reason? Are the above formulations actually different or is it due to different optimization method used? Thank you. Utkarsh Singhal 91.96508.54333
Linear model vs Mixed model
7 messages · Thierry Onkelinx, Cade, Brian, Utkarsh Singhal
The parametrisation is different. The intercept in model 1 is the effect of the "average" subject at days == 0. The intercept in model 2 is the effect of the first subject at days == 0. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
Hi experts,
While the slope is coming out to be identical in the two methods below, the
intercepts are not. As far as I understand, both are formulations are
identical in the sense that these are asking for a slope corresponding to
'Days' and a separate intercept term for each Subject.
# Model-1
library(lmer)
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
# Model-2
coef(lm(Reaction ~ Days + Subject, sleepstudy))
Can somebody tell me the reason? Are the above formulations actually
different or is it due to different optimization method used?
Thank you.
Utkarsh Singhal
91.96508.54333
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello Thierry, Thank you for your quick response. Sorry, but I am not sure if I follow what you said. I get the following outputs from the two models:
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
Subject (Intercept) Days 308 292.1888 10.46729 309 173.5556 10.46729 310 188.2965 10.46729 330 255.8115 10.46729 331 261.6213 10.46729 332 259.6263 10.46729 333 267.9056 10.46729 334 248.4081 10.46729 335 206.1230 10.46729 337 323.5878 10.46729 349 230.2089 10.46729 350 265.5165 10.46729 351 243.5429 10.46729 352 287.7835 10.46729 369 258.4415 10.46729 370 245.0424 10.46729 371 248.1108 10.46729 372 269.5209 10.46729
coef(lm(Reaction ~ Days + Subject, sleepstudy))
(Intercept) 295.03104 Days 10.46729 Subject309 -126.90085 Subject310 -111.13256 Subject330 -38.91241 Subject331 -32.69778 Subject332 -34.83176 Subject333 -25.97552 Subject334 -46.83178 Subject335 -92.06379 Subject337 33.58718 Subject349 -66.29936 Subject350 -28.53115 Subject351 -52.03608 Subject352 -4.71229 Subject369 -36.09919 Subject370 -50.43206 Subject371 -47.14979 Subject372 -24.24770 Now, what I expected is the following: - 'Intercept' of model-2 to match with Intercept of Subject-308 of model-1 - 'Intercept+Subject309' of model-2 to match with Intercept of Subject-309 of model-1 - and so on... What am I missing here? If it is difficult to explain this, can you alternately answer the following: "Is it possible to define the 'lm' and 'lmer' models above so they produce the same results (at least in terms of predictions)?" Thanks again. Utkarsh Singhal 91.96508.54333
On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be> wrote:
The parametrisation is different. The intercept in model 1 is the effect of the "average" subject at days == 0. The intercept in model 2 is the effect of the first subject at days == 0. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
Hi experts,
While the slope is coming out to be identical in the two methods below,
the
intercepts are not. As far as I understand, both are formulations are
identical in the sense that these are asking for a slope corresponding to
'Days' and a separate intercept term for each Subject.
# Model-1
library(lmer)
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
# Model-2
coef(lm(Reaction ~ Days + Subject, sleepstudy))
Can somebody tell me the reason? Are the above formulations actually
different or is it due to different optimization method used?
Thank you.
Utkarsh Singhal
91.96508.54333
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Your lm() estimates are using the default contrasts of contr.treatment, providing an intercept corresponding to your subject 308 and the other subject* estimates are differences from subject 308 intercept. You could have specified this with contrasts as contr.sum and the estimates would be more easily compared to the lmer() model estimates. They are close but will never be identical as the lmer() model estimates are based on assuming a normal distribution with specified variance. They rarely would be identical. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Tue, Jul 12, 2016 at 12:10 PM, Utkarsh Singhal <utkarsh.iit at gmail.com> wrote:
Hello Thierry, Thank you for your quick response. Sorry, but I am not sure if I follow what you said. I get the following outputs from the two models:
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
Subject (Intercept) Days 308 292.1888 10.46729 309 173.5556 10.46729 310 188.2965 10.46729 330 255.8115 10.46729 331 261.6213 10.46729 332 259.6263 10.46729 333 267.9056 10.46729 334 248.4081 10.46729 335 206.1230 10.46729 337 323.5878 10.46729 349 230.2089 10.46729 350 265.5165 10.46729 351 243.5429 10.46729 352 287.7835 10.46729 369 258.4415 10.46729 370 245.0424 10.46729 371 248.1108 10.46729 372 269.5209 10.46729
coef(lm(Reaction ~ Days + Subject, sleepstudy))
(Intercept) 295.03104 Days 10.46729 Subject309 -126.90085 Subject310 -111.13256 Subject330 -38.91241 Subject331 -32.69778 Subject332 -34.83176 Subject333 -25.97552 Subject334 -46.83178 Subject335 -92.06379 Subject337 33.58718 Subject349 -66.29936 Subject350 -28.53115 Subject351 -52.03608 Subject352 -4.71229 Subject369 -36.09919 Subject370 -50.43206 Subject371 -47.14979 Subject372 -24.24770 Now, what I expected is the following: - 'Intercept' of model-2 to match with Intercept of Subject-308 of model-1 - 'Intercept+Subject309' of model-2 to match with Intercept of Subject-309 of model-1 - and so on... What am I missing here? If it is difficult to explain this, can you alternately answer the following: "Is it possible to define the 'lm' and 'lmer' models above so they produce the same results (at least in terms of predictions)?" Thanks again. Utkarsh Singhal 91.96508.54333 On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be> wrote:
The parametrisation is different. The intercept in model 1 is the effect of the "average" subject at days
==
0. The intercept in model 2 is the effect of the first subject at days == 0. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and
Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to
say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
Hi experts, While the slope is coming out to be identical in the two methods below, the intercepts are not. As far as I understand, both are formulations are identical in the sense that these are asking for a slope corresponding
to
'Days' and a separate intercept term for each Subject.
# Model-1
library(lmer)
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
# Model-2
coef(lm(Reaction ~ Days + Subject, sleepstudy))
Can somebody tell me the reason? Are the above formulations actually
different or is it due to different optimization method used?
Thank you.
Utkarsh Singhal
91.96508.54333
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Brian, This makes some sense to me theoretically, but doesn't pan out with my experiment. The contrasts default was the following as you said:
options("contrasts")
$contrasts
unordered ordered
"contr.treatment" "contr.poly"
I changed it as follows:
options(contracts=c(unordered="contr.sum", ordered="contr.poly"))
But the output of 'lm' model was exactly the same as before. Then I tried the following:
coef(lm(Reaction ~ Days + Subject, sleepstudy,
contrasts=list(Subject="contr.sum"))) The model output appears slightly different, but the 'Subject* + Intercept' is still exactly the same as the default 'lm' model. Also, as you can verify below, the predictions from two 'lm' models are exactly the same
fit_lm = lm(Reaction ~ Days + Subject, sleepstudy); fit_lm2 = lm(Reaction ~ Days + Subject, sleepstudy,
contrasts=list(Subject="contr.sum"))
head(predict(fit_lm))
1 2 3 4 5 6 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
head(predict(fit_lm2))
1 2 3 4 5 6 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675 And these are quite different from the mixedmodel:
fit_lmer = lmer(Reaction ~ Days + (1| Subject), sleepstudy) head(predict(fit_lmer))
1 2 3 4 5 6 292.1888 302.6561 313.1234 323.5907 334.0580 344.5252 Any idea? Utkarsh Singhal 91.96508.54333
On 13 July 2016 at 00:18, Cade, Brian <cadeb at usgs.gov> wrote:
Your lm() estimates are using the default contrasts of contr.treatment, providing an intercept corresponding to your subject 308 and the other subject* estimates are differences from subject 308 intercept. You could have specified this with contrasts as contr.sum and the estimates would be more easily compared to the lmer() model estimates. They are close but will never be identical as the lmer() model estimates are based on assuming a normal distribution with specified variance. They rarely would be identical. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Tue, Jul 12, 2016 at 12:10 PM, Utkarsh Singhal <utkarsh.iit at gmail.com> wrote:
Hello Thierry, Thank you for your quick response. Sorry, but I am not sure if I follow what you said. I get the following outputs from the two models:
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
Subject (Intercept) Days 308 292.1888 10.46729 309 173.5556 10.46729 310 188.2965 10.46729 330 255.8115 10.46729 331 261.6213 10.46729 332 259.6263 10.46729 333 267.9056 10.46729 334 248.4081 10.46729 335 206.1230 10.46729 337 323.5878 10.46729 349 230.2089 10.46729 350 265.5165 10.46729 351 243.5429 10.46729 352 287.7835 10.46729 369 258.4415 10.46729 370 245.0424 10.46729 371 248.1108 10.46729 372 269.5209 10.46729
coef(lm(Reaction ~ Days + Subject, sleepstudy))
(Intercept) 295.03104 Days 10.46729 Subject309 -126.90085 Subject310 -111.13256 Subject330 -38.91241 Subject331 -32.69778 Subject332 -34.83176 Subject333 -25.97552 Subject334 -46.83178 Subject335 -92.06379 Subject337 33.58718 Subject349 -66.29936 Subject350 -28.53115 Subject351 -52.03608 Subject352 -4.71229 Subject369 -36.09919 Subject370 -50.43206 Subject371 -47.14979 Subject372 -24.24770 Now, what I expected is the following: - 'Intercept' of model-2 to match with Intercept of Subject-308 of model-1 - 'Intercept+Subject309' of model-2 to match with Intercept of Subject-309 of model-1 - and so on... What am I missing here? If it is difficult to explain this, can you alternately answer the following: "Is it possible to define the 'lm' and 'lmer' models above so they produce the same results (at least in terms of predictions)?" Thanks again. Utkarsh Singhal 91.96508.54333 On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be> wrote:
The parametrisation is different. The intercept in model 1 is the effect of the "average" subject at days
==
0. The intercept in model 2 is the effect of the first subject at days ==
0.
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and
Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to
say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
Hi experts, While the slope is coming out to be identical in the two methods below, the intercepts are not. As far as I understand, both are formulations are identical in the sense that these are asking for a slope corresponding
to
'Days' and a separate intercept term for each Subject.
# Model-1
library(lmer)
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
# Model-2
coef(lm(Reaction ~ Days + Subject, sleepstudy))
Can somebody tell me the reason? Are the above formulations actually
different or is it due to different optimization method used?
Thank you.
Utkarsh Singhal
91.96508.54333
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Utkarsh: I think the differences between the lm and lmer estimates of the intercept are consistent with the regularization effect expected with mixed-effects models where the estimates shrink towards the mean slightly. I don't think there is any reason to expect exact agreement between the lm and lmer estimates. I didn't mean to imply that the different parameterization of the contrasts would make the lm estimates agree more with the lmer estimates, only that it might be easier to compare the regression summary output to see how similar/dissimilar they were. But your way of comparing them are adequate. Why would you think they should agree exactly? Larger sample sizes might make them closer (but I don't know your n) but my expectation is that they could only be close not exactly in agreement. The lmer intercept estimates are based on using a normal distribution with an estimated variance to locate the subject specific intercepts. Why do you think those modeled intercepts should exactly coincide with your fixed effects intercepts that makes no such distributional assumption? Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Wed, Jul 13, 2016 at 10:06 AM, Utkarsh Singhal <utkarsh.iit at gmail.com> wrote:
Hi Brian, This makes some sense to me theoretically, but doesn't pan out with my experiment. The contrasts default was the following as you said:
options("contrasts")
$contrasts
unordered ordered
"contr.treatment" "contr.poly"
I changed it as follows:
options(contracts=c(unordered="contr.sum", ordered="contr.poly"))
But the output of 'lm' model was exactly the same as before. Then I tried the following:
coef(lm(Reaction ~ Days + Subject, sleepstudy,
contrasts=list(Subject="contr.sum"))) The model output appears slightly different, but the 'Subject* + Intercept' is still exactly the same as the default 'lm' model. Also, as you can verify below, the predictions from two 'lm' models are exactly the same
fit_lm = lm(Reaction ~ Days + Subject, sleepstudy); fit_lm2 = lm(Reaction ~ Days + Subject, sleepstudy,
contrasts=list(Subject="contr.sum"))
head(predict(fit_lm))
1 2 3 4 5 6 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
head(predict(fit_lm2))
1 2 3 4 5 6 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675 And these are quite different from the mixedmodel:
fit_lmer = lmer(Reaction ~ Days + (1| Subject), sleepstudy) head(predict(fit_lmer))
1 2 3 4 5 6 292.1888 302.6561 313.1234 323.5907 334.0580 344.5252 Any idea? Utkarsh Singhal 91.96508.54333 On 13 July 2016 at 00:18, Cade, Brian <cadeb at usgs.gov> wrote:
Your lm() estimates are using the default contrasts of contr.treatment, providing an intercept corresponding to your subject 308 and the other subject* estimates are differences from subject 308 intercept. You could have specified this with contrasts as contr.sum and the estimates would be more easily compared to the lmer() model estimates. They are close but will never be identical as the lmer() model estimates are based on assuming a normal distribution with specified variance. They rarely would be identical. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Tue, Jul 12, 2016 at 12:10 PM, Utkarsh Singhal <utkarsh.iit at gmail.com> wrote:
Hello Thierry, Thank you for your quick response. Sorry, but I am not sure if I follow what you said. I get the following outputs from the two models:
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
Subject (Intercept) Days 308 292.1888 10.46729 309 173.5556 10.46729 310 188.2965 10.46729 330 255.8115 10.46729 331 261.6213 10.46729 332 259.6263 10.46729 333 267.9056 10.46729 334 248.4081 10.46729 335 206.1230 10.46729 337 323.5878 10.46729 349 230.2089 10.46729 350 265.5165 10.46729 351 243.5429 10.46729 352 287.7835 10.46729 369 258.4415 10.46729 370 245.0424 10.46729 371 248.1108 10.46729 372 269.5209 10.46729
coef(lm(Reaction ~ Days + Subject, sleepstudy))
(Intercept) 295.03104 Days 10.46729 Subject309 -126.90085 Subject310 -111.13256 Subject330 -38.91241 Subject331 -32.69778 Subject332 -34.83176 Subject333 -25.97552 Subject334 -46.83178 Subject335 -92.06379 Subject337 33.58718 Subject349 -66.29936 Subject350 -28.53115 Subject351 -52.03608 Subject352 -4.71229 Subject369 -36.09919 Subject370 -50.43206 Subject371 -47.14979 Subject372 -24.24770 Now, what I expected is the following: - 'Intercept' of model-2 to match with Intercept of Subject-308 of model-1 - 'Intercept+Subject309' of model-2 to match with Intercept of Subject-309 of model-1 - and so on... What am I missing here? If it is difficult to explain this, can you alternately answer the following: "Is it possible to define the 'lm' and 'lmer' models above so they produce the same results (at least in terms of predictions)?" Thanks again. Utkarsh Singhal 91.96508.54333 On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be> wrote:
The parametrisation is different. The intercept in model 1 is the effect of the "average" subject at
days ==
0. The intercept in model 2 is the effect of the first subject at days ==
0.
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and
Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able
to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does
not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
Hi experts, While the slope is coming out to be identical in the two methods
below,
the intercepts are not. As far as I understand, both are formulations are identical in the sense that these are asking for a slope
corresponding to
'Days' and a separate intercept term for each Subject.
# Model-1
library(lmer)
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
# Model-2
coef(lm(Reaction ~ Days + Subject, sleepstudy))
Can somebody tell me the reason? Are the above formulations actually
different or is it due to different optimization method used?
Thank you.
Utkarsh Singhal
91.96508.54333
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks Brian for all your kind help. "didn't mean to imply that the different parameterization of the contrasts would make the lm estimates agree more with the lmer estimates, only that it might be easier to compare the regression summary output to see how similar/dissimilar they were ". Got it now, and yes it makes sense for easy comparison. And yes, my 'n' is indeed small. I don't want to match them exactly, but just wanted to confirm that the underlying principle in the two methods is not entirely different and will go away with large data. In absence of that, I don't know how to choose between the two models. This brings to my real problem. Take the below two models for example: lm(y ~ x1 + x2*Subject) lmer(y ~ x1 + x2 + Subject + (x2|Subject)) With my limited knowledge, both of these should do the following: - fit linear model - to predict 'y' - by capturing the effect due to predictor variables x1, x2 and Subject - and an interaction effect of the combination of x2 and Subject These can of course use different optimization methods or make different distributional assumptions, but that effect should ideally go away with large data. So, I am not too concerned about that. I can sleep peacefully if you just say that the difference is due to optimization method used or distributional assumptions :). But if it is anything more too it, then how do I decide which of the above two models to choose in what scenario. Regards, Utkarsh Singhal 91.96508.54333
On 13 July 2016 at 22:56, Cade, Brian <cadeb at usgs.gov> wrote:
Utkarsh: I think the differences between the lm and lmer estimates of the intercept are consistent with the regularization effect expected with mixed-effects models where the estimates shrink towards the mean slightly. I don't think there is any reason to expect exact agreement between the lm and lmer estimates. I didn't mean to imply that the different parameterization of the contrasts would make the lm estimates agree more with the lmer estimates, only that it might be easier to compare the regression summary output to see how similar/dissimilar they were. But your way of comparing them are adequate. Why would you think they should agree exactly? Larger sample sizes might make them closer (but I don't know your n) but my expectation is that they could only be close not exactly in agreement. The lmer intercept estimates are based on using a normal distribution with an estimated variance to locate the subject specific intercepts. Why do you think those modeled intercepts should exactly coincide with your fixed effects intercepts that makes no such distributional assumption? Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Wed, Jul 13, 2016 at 10:06 AM, Utkarsh Singhal <utkarsh.iit at gmail.com> wrote:
Hi Brian, This makes some sense to me theoretically, but doesn't pan out with my experiment. The contrasts default was the following as you said:
options("contrasts")
$contrasts
unordered ordered
"contr.treatment" "contr.poly"
I changed it as follows:
options(contracts=c(unordered="contr.sum", ordered="contr.poly"))
But the output of 'lm' model was exactly the same as before. Then I tried the following:
coef(lm(Reaction ~ Days + Subject, sleepstudy,
contrasts=list(Subject="contr.sum"))) The model output appears slightly different, but the 'Subject* + Intercept' is still exactly the same as the default 'lm' model. Also, as you can verify below, the predictions from two 'lm' models are exactly the same
fit_lm = lm(Reaction ~ Days + Subject, sleepstudy); fit_lm2 = lm(Reaction ~ Days + Subject, sleepstudy,
contrasts=list(Subject="contr.sum"))
head(predict(fit_lm))
1 2 3 4 5 6 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
head(predict(fit_lm2))
1 2 3 4 5 6 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675 And these are quite different from the mixedmodel:
fit_lmer = lmer(Reaction ~ Days + (1| Subject), sleepstudy) head(predict(fit_lmer))
1 2 3 4 5 6 292.1888 302.6561 313.1234 323.5907 334.0580 344.5252 Any idea? Utkarsh Singhal 91.96508.54333 On 13 July 2016 at 00:18, Cade, Brian <cadeb at usgs.gov> wrote:
Your lm() estimates are using the default contrasts of contr.treatment, providing an intercept corresponding to your subject 308 and the other subject* estimates are differences from subject 308 intercept. You could have specified this with contrasts as contr.sum and the estimates would be more easily compared to the lmer() model estimates. They are close but will never be identical as the lmer() model estimates are based on assuming a normal distribution with specified variance. They rarely would be identical. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Tue, Jul 12, 2016 at 12:10 PM, Utkarsh Singhal <utkarsh.iit at gmail.com
wrote:
Hello Thierry, Thank you for your quick response. Sorry, but I am not sure if I follow what you said. I get the following outputs from the two models:
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
Subject (Intercept) Days 308 292.1888 10.46729 309 173.5556 10.46729 310 188.2965 10.46729 330 255.8115 10.46729 331 261.6213 10.46729 332 259.6263 10.46729 333 267.9056 10.46729 334 248.4081 10.46729 335 206.1230 10.46729 337 323.5878 10.46729 349 230.2089 10.46729 350 265.5165 10.46729 351 243.5429 10.46729 352 287.7835 10.46729 369 258.4415 10.46729 370 245.0424 10.46729 371 248.1108 10.46729 372 269.5209 10.46729
coef(lm(Reaction ~ Days + Subject, sleepstudy))
(Intercept) 295.03104 Days 10.46729 Subject309 -126.90085 Subject310 -111.13256 Subject330 -38.91241 Subject331 -32.69778 Subject332 -34.83176 Subject333 -25.97552 Subject334 -46.83178 Subject335 -92.06379 Subject337 33.58718 Subject349 -66.29936 Subject350 -28.53115 Subject351 -52.03608 Subject352 -4.71229 Subject369 -36.09919 Subject370 -50.43206 Subject371 -47.14979 Subject372 -24.24770 Now, what I expected is the following: - 'Intercept' of model-2 to match with Intercept of Subject-308 of model-1 - 'Intercept+Subject309' of model-2 to match with Intercept of Subject-309 of model-1 - and so on... What am I missing here? If it is difficult to explain this, can you alternately answer the following: "Is it possible to define the 'lm' and 'lmer' models above so they produce the same results (at least in terms of predictions)?" Thanks again. Utkarsh Singhal 91.96508.54333 On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be> wrote:
The parametrisation is different. The intercept in model 1 is the effect of the "average" subject at
days ==
0. The intercept in model 2 is the effect of the first subject at days
== 0.
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for
Nature and
Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be able
to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does
not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
Hi experts, While the slope is coming out to be identical in the two methods
below,
the intercepts are not. As far as I understand, both are formulations are identical in the sense that these are asking for a slope
corresponding to
'Days' and a separate intercept term for each Subject.
# Model-1
library(lmer)
coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
# Model-2
coef(lm(Reaction ~ Days + Subject, sleepstudy))
Can somebody tell me the reason? Are the above formulations actually
different or is it due to different optimization method used?
Thank you.
Utkarsh Singhal
91.96508.54333
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.