Dear List,
I have a quick question regarding the setup of my data for analysis with a glmm. I hope this is the appropriate list, i apologise if it is not.
I have a response variable, TRUE or FALSE. I have coded this as 0 = False and 1 = TRUE in excel.
I have 3 categorical factors with C,D and E
I then read in the data frame and run the model as follows-
lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial)
And this is the output
Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | A/B) + C + D+ E ,family=binomial)
AIC BIC logLik deviance
1410 1450 -696.8 1394
Random effects:
Groups Name Variance Std.Dev.
family:order (Intercept) 6.7869e-01 8.2382e-01
order (Intercept) 7.8204e-11 8.8433e-06
Number of obs: 1116, groups: A:B, 43; B, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.11281 0.42232 0.267 0.7894
C1 -0.02414 0.19964 -0.121 0.9038
D2 -0.16482 0.38602 -0.427 0.6694
E2 0.95381 0.54316 1.756 0.0791 .
E3 0.75733 0.87275 0.868 0.3855
E4 0.03044 0.47328 0.064 0.9487
What i am unsure about is the inference, if a term is significant does this relate to TRUE or FALSE?
I.E E2 has a p value of 0.079, does this 0.079 relate to the probability of it resulting in a true or false response? Does it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
Maybe i am reading the output wrong?
Thanks
John
Binary response ordering
6 messages · ONKELINX, Thierry, John Haart, Douglas Bates
Is this homework? The data and the analysis look very similar to the one is this post https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004203.html ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens John Haart
Verzonden: woensdag 4 augustus 2010 10:54
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] Binary response ordering
Dear List,
I have a quick question regarding the setup of my data for
analysis with a glmm. I hope this is the appropriate list, i
apologise if it is not.
I have a response variable, TRUE or FALSE. I have coded this
as 0 = False and 1 = TRUE in excel.
I have 3 categorical factors with C,D and E
I then read in the data frame and run the model as follows-
lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial)
And this is the output
Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | A/B) + C + D+ E ,family=binomial)
AIC BIC logLik deviance
1410 1450 -696.8 1394
Random effects:
Groups Name Variance Std.Dev.
family:order (Intercept) 6.7869e-01 8.2382e-01
order (Intercept) 7.8204e-11 8.8433e-06
Number of obs: 1116, groups: A:B, 43; B, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.11281 0.42232 0.267 0.7894
C1 -0.02414 0.19964 -0.121 0.9038
D2 -0.16482 0.38602 -0.427 0.6694
E2 0.95381 0.54316 1.756 0.0791 .
E3 0.75733 0.87275 0.868 0.3855
E4 0.03044 0.47328 0.064 0.9487
What i am unsure about is the inference, if a term is
significant does this relate to TRUE or FALSE?
I.E E2 has a p value of 0.079, does this 0.079 relate to the
probability of it resulting in a true or false response? Does
it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
Maybe i am reading the output wrong?
Thanks
John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
No its not homework, Its a group undergrad project, is this not the appropriate forum? Thanks
On 4 Aug 2010, at 10:05, ONKELINX, Thierry wrote:
Is this homework? The data and the analysis look very similar to the one is this post https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004203.html ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens John Haart
Verzonden: woensdag 4 augustus 2010 10:54
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] Binary response ordering
Dear List,
I have a quick question regarding the setup of my data for
analysis with a glmm. I hope this is the appropriate list, i
apologise if it is not.
I have a response variable, TRUE or FALSE. I have coded this
as 0 = False and 1 = TRUE in excel.
I have 3 categorical factors with C,D and E
I then read in the data frame and run the model as follows-
lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial)
And this is the output
Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | A/B) + C + D+ E ,family=binomial)
AIC BIC logLik deviance
1410 1450 -696.8 1394
Random effects:
Groups Name Variance Std.Dev.
family:order (Intercept) 6.7869e-01 8.2382e-01
order (Intercept) 7.8204e-11 8.8433e-06
Number of obs: 1116, groups: A:B, 43; B, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.11281 0.42232 0.267 0.7894
C1 -0.02414 0.19964 -0.121 0.9038
D2 -0.16482 0.38602 -0.427 0.6694
E2 0.95381 0.54316 1.756 0.0791 .
E3 0.75733 0.87275 0.868 0.3855
E4 0.03044 0.47328 0.064 0.9487
What i am unsure about is the inference, if a term is
significant does this relate to TRUE or FALSE?
I.E E2 has a p value of 0.079, does this 0.079 relate to the
probability of it resulting in a true or false response? Does
it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
Maybe i am reading the output wrong?
Thanks
John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On Wed, Aug 4, 2010 at 4:54 AM, John Haart <another83 at me.com> wrote:
Dear List, I have a quick question regarding the setup of my data for analysis with a glmm. ?I hope this is the appropriate list, i apologise if it is not. I have a response variable, TRUE or FALSE. I have coded this as 0 = False and 1 = TRUE in excel. I have 3 categorical factors with C,D and E I then read in the data frame and run the model as follows- lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial) And this is the output Generalized linear mixed model fit by the Laplace approximation Formula: threatornot ~ 1 + (1 | A/B) + C + D+ ?E ,family=binomial) ?AIC ?BIC logLik deviance ?1410 1450 -696.8 ? ? 1394 Random effects: ?Groups ? ? ? Name ? ? ? ?Variance ? Std.Dev. ?family:order (Intercept) 6.7869e-01 8.2382e-01 ?order ? ? ? ?(Intercept) 7.8204e-11 8.8433e-06 Number of obs: 1116, groups: A:B, 43; B, 9
Apparently you altered the output at some point because the factors that were named A and B ended up as order and family in the random effects description.
Fixed effects: ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) (Intercept) ?0.11281 ? ?0.42232 ? 0.267 ? 0.7894 C1 ? -0.02414 ? ?0.19964 ?-0.121 ? 0.9038 D2 ?-0.16482 ? ?0.38602 ?-0.427 ? 0.6694 E2 ? ? ? 0.95381 ? ?0.54316 ? 1.756 ? 0.0791 . E3 ? ? ?0.75733 ? ?0.87275 ? 0.868 ? 0.3855 E4 ? ? ? 0.03044 ? ?0.47328 ? 0.064 ? 0.9487 What i am unsure about is the inference, if a term is significant does this relate to TRUE or FALSE?
In this case it would be related to the probability of a TRUE response but, as this is simply 1 - P(FALSE) then the only change if you reversed the order would be to change the signs of the coefficients. The simple way to verify this is to fit glm(threatornot ~ 1) and check the value of the coefficient. It should be log(pHat/(1-pHat)) where pHat is the proportion of TRUE responses.
I.E E2 has a p value of 0.079, does this 0.079 relate to the probability of it resulting in a true or false response? Does it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
If there are two levels in the response then the model is fit according to the probability of the second versus the first. You can disambiguate the process if you convert the response to a factor with the levels specified explicitly. The bigger issue is that you shouldn't pay too much attention to a particular coefficient related to the levels of a factor like E because the coefficients are defined with respect to the contrasts in effect at the time the model was fit. Without knowing the contrasts being used and without prior knowledge that a particular contrast was important, those coefficients are not important by themselves. It is the cumulative effect of the variability amongst the levels of the factor that is important.
Maybe i am reading the output wrong? Thanks John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Douglas, Thanks very much for this,
You can disambiguate the process if you convert the response to a factor with the levels specified explicitly.
I am a little unsure what this means? My response is TRUE or FALSE. However there are different levels of TRUE / FALSE. At this point i have not discriminated between them, as i am unsure how, my thought was to convert them to a continuous factor and use this as a response? Instead of having a multi level categorical response which i don't think is possible in lmer? Whilst on the subject of P-Values, I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham & Anderson, 2002). However there seems to be a huge difference in my results. The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match? Thanks
On 4 Aug 2010, at 14:15, Douglas Bates wrote:
On Wed, Aug 4, 2010 at 4:54 AM, John Haart <another83 at me.com> wrote:
Dear List, I have a quick question regarding the setup of my data for analysis with a glmm. I hope this is the appropriate list, i apologise if it is not. I have a response variable, TRUE or FALSE. I have coded this as 0 = False and 1 = TRUE in excel. I have 3 categorical factors with C,D and E I then read in the data frame and run the model as follows- lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial) And this is the output Generalized linear mixed model fit by the Laplace approximation Formula: threatornot ~ 1 + (1 | A/B) + C + D+ E ,family=binomial) AIC BIC logLik deviance 1410 1450 -696.8 1394 Random effects: Groups Name Variance Std.Dev. family:order (Intercept) 6.7869e-01 8.2382e-01 order (Intercept) 7.8204e-11 8.8433e-06 Number of obs: 1116, groups: A:B, 43; B, 9
Apparently you altered the output at some point because the factors that were named A and B ended up as order and family in the random effects description.
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.11281 0.42232 0.267 0.7894
C1 -0.02414 0.19964 -0.121 0.9038
D2 -0.16482 0.38602 -0.427 0.6694
E2 0.95381 0.54316 1.756 0.0791 .
E3 0.75733 0.87275 0.868 0.3855
E4 0.03044 0.47328 0.064 0.9487
What i am unsure about is the inference, if a term is significant does this relate to TRUE or FALSE?
In this case it would be related to the probability of a TRUE response but, as this is simply 1 - P(FALSE) then the only change if you reversed the order would be to change the signs of the coefficients. The simple way to verify this is to fit glm(threatornot ~ 1) and check the value of the coefficient. It should be log(pHat/(1-pHat)) where pHat is the proportion of TRUE responses.
I.E E2 has a p value of 0.079, does this 0.079 relate to the probability of it resulting in a true or false response? Does it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
If there are two levels in the response then the model is fit according to the probability of the second versus the first. You can disambiguate the process if you convert the response to a factor with the levels specified explicitly. The bigger issue is that you shouldn't pay too much attention to a particular coefficient related to the levels of a factor like E because the coefficients are defined with respect to the contrasts in effect at the time the model was fit. Without knowing the contrasts being used and without prior knowledge that a particular contrast was important, those coefficients are not important by themselves. It is the cumulative effect of the variability amongst the levels of the factor that is important.
Maybe i am reading the output wrong? Thanks John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On Wed, Aug 4, 2010 at 9:30 AM, John Haart <another83 at me.com> wrote:
Dear Douglas, Thanks very much for this,
You can disambiguate the process if you convert the response to a factor with the levels specified explicitly.
I am a little unsure what this means?
I just meant that when you create a factor from a numeric variable you can either accept the default ordering of the factor levels, which is lexicographic (if all the numeric values are small integers this corresponds to numeric ordering but as soon as you get numbers like 10 you have to be careful because 10 sorts before 2 in lexicographic ordering) or you can impose an ordering. A glm or glmer model fit for family = binomial with the response a factor with two levels uses the 2nd level as "success" and the first level as "failure".
My response is TRUE or FALSE. However there are different levels of TRUE / FALSE. At this point i have not discriminated between them, as i am unsure how, my thought was to convert them to a continuous factor and use this as a response? Instead of having a multi level categorical response which i don't think is possible in lmer? Whilst on the subject of P-Values, I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham & Anderson, 2002). However there seems to be a huge difference in my results.
I'll leave it to others to comment on p-values, AIC, etc.
The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match? Thanks On 4 Aug 2010, at 14:15, Douglas Bates wrote: On Wed, Aug 4, 2010 at 4:54 AM, John Haart <another83 at me.com> wrote:
Dear List, I have a quick question regarding the setup of my data for analysis with a glmm. ?I hope this is the appropriate list, i apologise if it is not. I have a response variable, TRUE or FALSE. I have coded this as 0 = False and 1 = TRUE in excel. I have 3 categorical factors with C,D and E I then read in the data frame and run the model as follows- lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial) And this is the output Generalized linear mixed model fit by the Laplace approximation Formula: threatornot ~ 1 + (1 | A/B) + C + D+ ?E ,family=binomial) ?AIC ?BIC logLik deviance ?1410 1450 -696.8 ? ? 1394 Random effects: ?Groups ? ? ? Name ? ? ? ?Variance ? Std.Dev. ?family:order (Intercept) 6.7869e-01 8.2382e-01 ?order ? ? ? ?(Intercept) 7.8204e-11 8.8433e-06 Number of obs: 1116, groups: A:B, 43; B, 9
Apparently you altered the output at some point because the factors that were named A and B ended up as order and family in the random effects description.
Fixed effects: ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) (Intercept) ?0.11281 ? ?0.42232 ? 0.267 ? 0.7894 C1 ? -0.02414 ? ?0.19964 ?-0.121 ? 0.9038 D2 ?-0.16482 ? ?0.38602 ?-0.427 ? 0.6694 E2 ? ? ? 0.95381 ? ?0.54316 ? 1.756 ? 0.0791 . E3 ? ? ?0.75733 ? ?0.87275 ? 0.868 ? 0.3855 E4 ? ? ? 0.03044 ? ?0.47328 ? 0.064 ? 0.9487 What i am unsure about is the inference, if a term is significant does this relate to TRUE or FALSE?
In this case it would be related to the probability of a TRUE response but, as this is simply 1 - P(FALSE) then the only change if you reversed the order would be to change the signs of the coefficients. The simple way to verify this is to fit glm(threatornot ~ 1) and check the value of the coefficient. ?It should be log(pHat/(1-pHat)) where pHat is the proportion of TRUE responses.
I.E E2 has a p value of 0.079, does this 0.079 relate to the probability of it resulting in a true or false response? Does it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
If there are two levels in the response then the model is fit according to the probability of the second versus the first. ?You can disambiguate the process if you convert the response to a factor with the levels specified explicitly. The bigger issue is that you shouldn't pay too much attention to a particular coefficient related to the levels of a factor like E because the coefficients are defined with respect to the contrasts in effect at the time the model was fit. ?Without knowing the contrasts being used and without prior knowledge that a particular contrast was important, those coefficients are not important by themselves. ?It is the cumulative effect of the variability amongst the levels of the factor that is important.
Maybe i am reading the output wrong? Thanks John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models