Skip to content

Binary response ordering

6 messages · ONKELINX, Thierry, John Haart, Douglas Bates

#
Dear List,

I have a quick question regarding the setup of my data for analysis with a glmm.  I hope this is the appropriate list, i apologise if it is not.

I have a response variable, TRUE or FALSE. I have coded this as 0 = False and 1 = TRUE in excel.

I have 3 categorical factors with C,D and E

I then read in the data frame and run the model as follows-

lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial)

And this is the output

Generalized linear mixed model fit by the Laplace approximation 
Formula: threatornot ~ 1 + (1 | A/B) + C + D+  E ,family=binomial)
  AIC  BIC logLik deviance
 1410 1450 -696.8     1394
Random effects:
 Groups       Name        Variance   Std.Dev.  
 family:order (Intercept) 6.7869e-01 8.2382e-01
 order        (Intercept) 7.8204e-11 8.8433e-06
Number of obs: 1116, groups: A:B, 43; B, 9

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.11281    0.42232   0.267   0.7894  
C1   -0.02414    0.19964  -0.121   0.9038  
D2  -0.16482    0.38602  -0.427   0.6694  
E2       0.95381    0.54316   1.756   0.0791 .
E3      0.75733    0.87275   0.868   0.3855  
E4       0.03044    0.47328   0.064   0.9487  

What i am unsure about is the inference, if a term is significant does this relate to TRUE or FALSE?

I.E E2 has a p value of 0.079, does this 0.079 relate to the probability of it resulting in a true or false response? Does it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?

Maybe i am reading the output wrong?

Thanks

John
#
Is this homework? The data and the analysis look very similar to the one
is this post
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004203.html

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.
#
No its not homework,

Its a group undergrad project, is this not the appropriate forum?

Thanks
On 4 Aug 2010, at 10:05, ONKELINX, Thierry wrote:
Is this homework? The data and the analysis look very similar to the one
is this post
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004203.html

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
On Wed, Aug 4, 2010 at 4:54 AM, John Haart <another83 at me.com> wrote:
Apparently you altered the output at some point because the factors
that were named A and B ended up as order and family in the random
effects description.
In this case it would be related to the probability of a TRUE response
but, as this is simply 1 - P(FALSE) then the only change if you
reversed the order would be to change the signs of the coefficients.
The simple way to verify this is to fit

glm(threatornot ~ 1)

and check the value of the coefficient.  It should be
log(pHat/(1-pHat)) where pHat is the proportion of TRUE responses.
If there are two levels in the response then the model is fit
according to the probability of the second versus the first.  You can
disambiguate the process if you convert the response to a factor with
the levels specified explicitly.

The bigger issue is that you shouldn't pay too much attention to a
particular coefficient related to the levels of a factor like E
because the coefficients are defined with respect to the contrasts in
effect at the time the model was fit.  Without knowing the contrasts
being used and without prior knowledge that a particular contrast was
important, those coefficients are not important by themselves.  It is
the cumulative effect of the variability amongst the levels of the
factor that is important.
#
Dear Douglas,

Thanks very much for this,
I am a little unsure what this means?

My response is TRUE or FALSE. However there are different levels of TRUE / FALSE. At this point i have not discriminated between them, as i am unsure how, my thought was to convert them to a continuous factor and use this as a response? Instead of having a multi level categorical response which i don't think is possible in lmer? 

Whilst on the subject of P-Values, 

I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham & Anderson, 2002). However there seems to be a huge difference in my results.

The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match?

Thanks
On 4 Aug 2010, at 14:15, Douglas Bates wrote:

        
On Wed, Aug 4, 2010 at 4:54 AM, John Haart <another83 at me.com> wrote:
Apparently you altered the output at some point because the factors
that were named A and B ended up as order and family in the random
effects description.
In this case it would be related to the probability of a TRUE response
but, as this is simply 1 - P(FALSE) then the only change if you
reversed the order would be to change the signs of the coefficients.
The simple way to verify this is to fit

glm(threatornot ~ 1)

and check the value of the coefficient.  It should be
log(pHat/(1-pHat)) where pHat is the proportion of TRUE responses.
If there are two levels in the response then the model is fit
according to the probability of the second versus the first.  You can
disambiguate the process if you convert the response to a factor with
the levels specified explicitly.

The bigger issue is that you shouldn't pay too much attention to a
particular coefficient related to the levels of a factor like E
because the coefficients are defined with respect to the contrasts in
effect at the time the model was fit.  Without knowing the contrasts
being used and without prior knowledge that a particular contrast was
important, those coefficients are not important by themselves.  It is
the cumulative effect of the variability amongst the levels of the
factor that is important.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
On Wed, Aug 4, 2010 at 9:30 AM, John Haart <another83 at me.com> wrote:
I just meant that when you create a factor from a numeric variable you
can either accept the default ordering of the factor levels, which is
lexicographic (if all the numeric values are small integers this
corresponds to numeric ordering but as soon as you get numbers like 10
you have to be careful because 10 sorts before 2 in lexicographic
ordering) or you can impose an ordering.

A glm or glmer model fit for family = binomial with the response a
factor with two levels uses the 2nd level as "success" and the first
level as "failure".
I'll leave it to others to comment on p-values, AIC, etc.