Dear All, I have a quick questions about comparing results from lmer and from glm. We are running analysis to predict a person's likelihood of leaving a project with some people affiliated with multiple projects (binary outcome and crossed random effects). The data consist of three levels: projects, members (crossed with projects with 70% members with one project and 30% with multiple projects), and time series nested within individuals. I ran the analysis with first glm (family=binomial) and then lmer (family-binomial and + (1 | projectid) + (1 | memberid) to account for the random effects). The two analyses have the same covariates: project size and scope and some individual member attributes such as tenure and past performance. Theoretically, I expect the coefficients to be similar between the two results with some differences in the significance test or confidence intervals. However, I found three coefficients flipped signs between the two, which is very puzzling. I ran another set of analysis with a continuous dependent variable (quantity of work completed) and found similar coefficients between the two (results from lm and lmer). So my question is: should we expect the results from glm and lmer to be similar? If we should see different results, is it because of the distribution being binomial rather than normal or other reasons? Which set of results is more reliable and should be included in our paper? Thanks very much. Ching Ren
lmer versus glm results
5 messages · Thomas Levine, Yuqing Ren, John Maindonald
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110525/3f8a7275/attachment.pl>
Dear Tom,
Thanks very much for your response. Here are the commands I ran.
glm(leaving ~ quarter + project_scope + project_size + tenure +
pastwork, family=binomial("logit"), data=all)
lmer(leaving ~ quarter + project_scope + project_size + tenure +
pastwork + ( 1 + quarter | project_id) + (1 + quarter | user_id),
family=binomial, data=all)
Ching
On Wed, May 25, 2011 at 3:38 PM, Thomas Levine <tkl22 at cornell.edu> wrote:
Could you post the commands you ran? Tom On Wed, May 25, 2011 at 12:25 PM, Yuqing Ren <chingren at umn.edu> wrote:
Dear All, I have a quick questions about comparing results from lmer and from glm. We are running analysis to predict a person's likelihood of leaving a project with some people affiliated with multiple projects (binary outcome and crossed random effects). The data consist of three levels: projects, members (crossed with projects with 70% members with one project and 30% with multiple projects), and time series nested within individuals. I ran the analysis with first glm (family=binomial) and then lmer (family-binomial and + (1 | projectid) + (1 | memberid) to account for the random effects). The two analyses have the same covariates: project size and scope and some individual member attributes such as tenure and past performance. Theoretically, I expect the coefficients to be similar between the two results with some differences in the significance test or confidence intervals. However, I found three coefficients flipped signs between the two, which is very puzzling. I ran another set of analysis with a continuous dependent variable (quantity of work completed) and found similar coefficients between the two (results from lm and lmer). So my question is: should we expect the results from glm and lmer to be similar? If we should see different results, is it because of the distribution being binomial rather than normal or other reasons? Which set of results is more reliable and should be included in our paper? Thanks very much. Ching Ren
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Yuqing (Ching) Ren Assistant Professor at Carlson School of Management University of Minnesota, CSOM 3-370 321 19th Avenue S., Minneapolis, MN 55455 (tel) 612-625-5242 (fax) 612-626-1316
The more relevant comparison is between
1) {
glm(leaving ~ quarter + project_scope + project_size + tenure + pastwork
+ <additional fixed effect terms that account, now as fixed effectsm for the same
main effects and interactions as ( 1 + quarter | project_id) + (1 + quarter | user_id)>,
family=binomial("logit"), data=all)
[replacing the part between the diamond brackets (< >) by something that R can
interpret is left as an exercise for anyone who might welcome such a challenge!]
}
and 2) {
lmer(leaving ~ quarter + project_scope + project_size + tenure +
pastwork + ( 1 + quarter | project_id) + (1 + quarter | user_id),
family=binomial, data=all)
}
Note that the coefficient estimates are conditional on other effects for which the
relevant equation accounts. Change those other effects and you are likely to
change the coefficients, and the coefficient estimates.
What is probably a second order effect (& not needed to explain what you see
here) is that the relative weighting of the observations will be different in the
random effects analysis, even for a 'relevant' comparison.
The following makes the point re interpretation of regression coefficients well,
albeit in a standard least squares regression context:
"Interpreting Regression Coefficients", at:
http://www.mosaic-web.org/MCAST/videos/MCAST-2010-09-10/lib/playback.html
This is one in a series of "M-casts". A complete list is at:
http://www.causeweb.org/wiki/mosaic/index.php/Pub100
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/05/2011, at 2:53 PM, Yuqing Ren wrote:
Dear Tom,
Thanks very much for your response. Here are the commands I ran.
glm(leaving ~ quarter + project_scope + project_size + tenure +
pastwork, family=binomial("logit"), data=all)
lmer(leaving ~ quarter + project_scope + project_size + tenure +
pastwork + ( 1 + quarter | project_id) + (1 + quarter | user_id),
family=binomial, data=all)
Ching
On Wed, May 25, 2011 at 3:38 PM, Thomas Levine <tkl22 at cornell.edu> wrote:
Could you post the commands you ran? Tom On Wed, May 25, 2011 at 12:25 PM, Yuqing Ren <chingren at umn.edu> wrote:
Dear All, I have a quick questions about comparing results from lmer and from glm. We are running analysis to predict a person's likelihood of leaving a project with some people affiliated with multiple projects (binary outcome and crossed random effects). The data consist of three levels: projects, members (crossed with projects with 70% members with one project and 30% with multiple projects), and time series nested within individuals. I ran the analysis with first glm (family=binomial) and then lmer (family-binomial and + (1 | projectid) + (1 | memberid) to account for the random effects). The two analyses have the same covariates: project size and scope and some individual member attributes such as tenure and past performance. Theoretically, I expect the coefficients to be similar between the two results with some differences in the significance test or confidence intervals. However, I found three coefficients flipped signs between the two, which is very puzzling. I ran another set of analysis with a continuous dependent variable (quantity of work completed) and found similar coefficients between the two (results from lm and lmer). So my question is: should we expect the results from glm and lmer to be similar? If we should see different results, is it because of the distribution being binomial rather than normal or other reasons? Which set of results is more reliable and should be included in our paper? Thanks very much. Ching Ren
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Yuqing (Ching) Ren Assistant Professor at Carlson School of Management University of Minnesota, CSOM 3-370 321 19th Avenue S., Minneapolis, MN 55455 (tel) 612-625-5242 (fax) 612-626-1316
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Excellent! This is exactly the answer I was looking for and it makes perfect sense. Thank you, John and Tom for your help. Ching On Thu, May 26, 2011 at 3:03 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
The more relevant comparison is between
1) {
glm(leaving ~ quarter + project_scope + project_size + tenure + pastwork
+ <additional fixed effect terms that account, now as fixed effectsm for the same
main effects and interactions as ( 1 + quarter | project_id) + (1 + quarter | user_id)>,
family=binomial("logit"), data=all)
[replacing the part between the diamond brackets (< >) by something that R can
interpret is left as an exercise for anyone who might welcome such a challenge!]
}
and 2) {
lmer(leaving ~ quarter + project_scope + project_size + tenure +
pastwork + ( 1 + quarter | project_id) + (1 + quarter | user_id),
family=binomial, data=all)
}
Note that the coefficient estimates are conditional on other effects for which the
relevant equation accounts. ?Change those other effects and you are likely to
change the coefficients, and the coefficient estimates.
What is probably a second order effect (& not needed to explain what you see
here) is that the relative weighting of the observations will be different in the
random effects analysis, even for a 'relevant' comparison.
The following makes the point re interpretation of regression coefficients well,
albeit in a standard least squares regression context:
"Interpreting Regression Coefficients", at:
http://www.mosaic-web.org/MCAST/videos/MCAST-2010-09-10/lib/playback.html
This is one in a series of "M-casts". ?A complete list is at:
http://www.causeweb.org/wiki/mosaic/index.php/Pub100
John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/05/2011, at 2:53 PM, Yuqing Ren wrote:
Dear Tom,
Thanks very much for your response. Here are the commands I ran.
glm(leaving ~ quarter + project_scope + project_size + tenure +
pastwork, family=binomial("logit"), data=all)
lmer(leaving ~ quarter + project_scope + project_size + tenure +
pastwork + ( 1 + quarter | project_id) + (1 + quarter | user_id),
family=binomial, data=all)
Ching
On Wed, May 25, 2011 at 3:38 PM, Thomas Levine <tkl22 at cornell.edu> wrote:
Could you post the commands you ran? Tom On Wed, May 25, 2011 at 12:25 PM, Yuqing Ren <chingren at umn.edu> wrote:
Dear All, I have a quick questions about comparing results from lmer and from glm. We are running analysis to predict a person's likelihood of leaving a project with some people affiliated with multiple projects (binary outcome and crossed random effects). The data consist of three levels: projects, members (crossed with projects with 70% members with one project and 30% with multiple projects), and time series nested within individuals. I ran the analysis with first glm (family=binomial) and then lmer (family-binomial and + (1 | projectid) + (1 | memberid) to account for the random effects). The two analyses have the same covariates: project size and scope and some individual member attributes such as tenure and past performance. Theoretically, I expect the coefficients to be similar between the two results with some differences in the significance test or confidence intervals. However, I found three coefficients flipped signs between the two, which is very puzzling. I ran another set of analysis with a continuous dependent variable (quantity of work completed) and found similar coefficients between the two (results from lm and lmer). So my question is: should we expect the results from glm and lmer to be similar? If we should see different results, is it because of the distribution being binomial rather than normal or other reasons? Which set of results is more reliable and should be included in our paper? Thanks very much. Ching Ren
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Yuqing (Ching) Ren Assistant Professor at Carlson School of Management University of Minnesota, CSOM 3-370 321 19th Avenue S., Minneapolis, MN 55455 (tel) 612-625-5242 (fax) 612-626-1316
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Yuqing (Ching) Ren Assistant Professor at Carlson School of Management University of Minnesota, CSOM 3-370 321 19th Avenue S., Minneapolis, MN 55455 (tel) 612-625-5242 (fax) 612-626-1316