Logistic regression with 2 categorical predictors
Hi Jari, I've only just seen your post and as you would now know there has been a generous reply from Philip Dixon in the interim - the full details of which will take me some time to digest. However I am writing to say thanks for the response and yes you have guessed right that we had an ever declining pool of fish to run the choice tests- larval fish are not particularly robust unfortunately. I am thinking that Philips suggestion for a series of multiple tests along with the appropriate correction for doing so is the most palatable for me at this point in time. Andy
On 24 October 2014 16:54, Jari Oksanen <jari.oksanen at oulu.fi> wrote:
Andrew, If the 24 rows are the data you are analysing, I cannot replicate any of your significant results within that glm framework *if* I take into account the overdispersion. The full model with age*test interaction is saturated and cannot be analysed at all, but the main effects model age+test can be analysed (or either term separately). However, the results are overdispersed, and you should use family=quasibinomial instead of family=binomial, and then use test="F" instead of test="Chi":
anova(glm(cbind(prefer,avoid) ~ age+test, data=datafromthemail,
family=quasibinomial), test="F")
Analysis of Deviance Table
Model: quasibinomial, link: logit
Response: cbind(prefer, avoid)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev F Pr(>F)
NULL 23 54.908
age 5 11.2352 18 43.673 0.9844 0.4591
test 3 1.5934 15 42.079 0.2327 0.8722
As you see, the Resid. Dev is much larger than Resid. Df for both terms in
this sequential model, and therefore you must use quasibinomial models and
F-tests -- and these give similar results as other tests.
I could not get any results for the saturated models, and one of your
examples (below in this thread) seemed to use only one level of test and
only *six* observations: it was also saturated as you had no replicates for
your six age levels. You need replicates.
However, I'm not sure I understood your data correctly. It looks like you
have a certain number of animals, but their number is reduced with age so
that you have a kind of censored data (animals not available in all cases).
Perhaps somebody can propose a better analysis for such a censored data, if
it is like that.
Cheers, Jari Oksanen
On 24/10/2014, at 10:41 AM, Andrew Halford wrote:
Dear Gavin, Firstly let me say that I take offence at your "bogus" comment. Just because I, like many others who interact on this list, often struggle conceptually with the overwhelming analysis choices that are required in our line of work doesn't give you the right to drop snide remarks as you see fit. My line of query is ALWAYS genuine from my perspective and I don't expect you or anyone else to belittle people on the public list! As it turns out my issues are not resolved. To recap.. I have run a bunch of choice chamber experiments with larval fish.
Graphing
up the ratio of 0/1 choices produces a plot which shows to my eye,
evidence
of a result for some of the tests, with fish appearing to make defined choices in the later age groups for 2 of the tests. What appears to be happening is that because there are some empty cells
in
the later age x test interactions (the fish only took one option to the exclusion of the other) the errors are way out and hence preclude any chance of getting a significant result. If I add a single result to any
of
the zero cells to remove the "blank" the analysis actually works more as
I
hoped. However I doubt this is acceptable so I am hoping to get some help with producing an effective analysis without having to manipulate the
blank
cells. Andrew On 24 October 2014 04:08, Gavin Simpson <ucfagls at gmail.com> wrote:
This all looks bogus to me; you've fit the data perfectly by fitting a saturated model - there are no residual degrees of freedom and (effectively) zero residual deviance. Things are clearly amiss because
you
have huge standard errors. You have 24 data points and fit a model with
23
coefficient plus the intercept; you just replaced your data with 24 new data points (the values in the Estimate column of the summary() output) I really wouldn't bother interpreting it any further. HTH G On 21 October 2014 18:21, Andrew Halford <andrew.halford at gmail.com>
wrote:
Hi Thierry, The multiple comparisons ran just fine but there was a ridiculous
amount
of interaction combinations all of which were non-significant even though there was a highly significant interaction term. I decided to remove
test
as a variable to simplify the analysis and run separate single
explanatory
variable logistic regressions. I have included a result below which is still producing an outcome I cant explain. Namely, why am I getting
such a
significant result for the ANOVA but when I do the tukey tests nothing
is
significant?
sg_habitat
Age Prefer Avoid 1 1 17 14 2 2 20 10 3 3 14 9 4 4 13 12 5 5 0 18 6 6 0 5
model_sg <- glm(cbind(Prefer,Avoid) ~ Age, data=sg_habitat,
family=binomial)
anova(model_sg, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(Prefer, Avoid)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 5 36.588
Age 5 36.588 0 0.000 7.243e-07 ***
mc_sg <- glht(model_sg, mcp(Age = "Tukey"))
summary(mc_sg)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: glm(formula = cbind(Prefer, Avoid) ~ Age, family = binomial,
data = sg_habitat)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
2 - 1 == 0 0.4990 0.5294 0.943 0.912
3 - 1 == 0 0.2477 0.5593 0.443 0.997
4 - 1 == 0 -0.1141 0.5390 -0.212 1.000
5 - 1 == 0 -25.8473 53178.5362 0.000 1.000
6 - 1 == 0 -24.7307 57729.9299 0.000 1.000
3 - 2 == 0 -0.2513 0.5767 -0.436 0.997
4 - 2 == 0 -0.6131 0.5570 -1.101 0.844
5 - 2 == 0 -26.3463 53178.5362 0.000 1.000
6 - 2 == 0 -25.2296 57729.9299 0.000 1.000
4 - 3 == 0 -0.3618 0.5855 -0.618 0.985
5 - 3 == 0 -26.0950 53178.5362 0.000 1.000
6 - 3 == 0 -24.9783 57729.9299 0.000 1.000
5 - 4 == 0 -25.7332 53178.5362 0.000 1.000
6 - 4 == 0 -24.6165 57729.9299 0.000 1.000
6 - 5 == 0 1.1167 78490.1364 0.000 1.000
(Adjusted p values reported -- single-step method)
On 21 October 2014 22:53, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be>
wrote:
Hi Andrew, Please keep the mailing list in cc. The estimates in mc are the differences of the parameter estimates
(betas)
between both levels. E.g. 5.LR -1.LR = -1.168 or 5.LR = 1.LR - 1.168 summary(mc) should give you the significance of those differences.
That
should work. If it doesn't, please provide more info: at least your
code
and the error message. A small reproducible example is better. confint(mc) gives the confidence intervals of the differences. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and
Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be able
to
say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does
not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey Van: Andrew Halford [mailto:andrew.halford at gmail.com] Verzonden: dinsdag 21 oktober 2014 16:19 Aan: ONKELINX, Thierry Onderwerp: Re: [R-sig-eco] Logistic regression with 2 categorical predictors Hi Thierry, Thanks for the response. I have run your code but it seems you cant
call
the summary function, you just have to call the object on its own i.e.
mc.
The results are I get are below but I am not sure how to interpret
these,
exactly what does the estimate represent? How do I measure
significance
here?
Estimate
2.LR - 1.LR == 0 1.252e-01
3.LR - 1.LR == 0 -5.390e-01
4.LR - 1.LR == 0 1.802e-02
5.LR - 1.LR == 0 -1.168e+00
6.LR - 1.LR == 0 -2.575e+01
1.SD - 1.LR == 0 7.411e-02
2.SD - 1.LR == 0 -2.408e-01
3.SD - 1.LR == 0 2.675e-01
etc etc
Andy
On 20 October 2014 23:04, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be
wrote: Dear Andrew, anova() and summary() test different hypotheses. anova() tests is at
least
one level is different from the others. summary() tests if the
coefficient
is different from zero. Multiple comparison of different interaction levels is probably the
most
relevant in this case. The easiest way is to make a new variable. snapper2$inter <- with(snapper2, interaction(age, test)) model <- glm(cbind(prefer,avoid) ~ 0 + inter, data=snapper2, family=binomial) library(multcomp) mc <- glht(model, mcp(inter = "Tukey)) summary(mc) Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and
Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be able
to
say
what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does
not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-ecology-bounces at r-project.org [mailto: r-sig-ecology-bounces at r-project.org] Namens Andrew Halford Verzonden: maandag 20 oktober 2014 16:06 Aan: r-sig-ecology at r-project.org Onderwerp: [R-sig-eco] Logistic regression with 2 categorical
predictors
Hi Listers, I am trying to run a logistic regression to look at the effects of experiment type and age on the behavior of fish in a choice chamber experiment. I am using the glm approach and would like some advice on how or
whether
to perform contrasts to work out what levels of Factor1 (Age) and
Factor 2
(Test) are significantly different from each other. I have not been
able
to clarify from my reading what is the appropriate approach to take
when
dealing with a significant interaction term. I am also not sure as to
how
one interprets a model when all the coefficients are non-significant
but
the chi-square ANOVA shows a highly significant interaction term. I have graphed up the data as dot plots and there is definitely
evidence
of changes in proportions in later ages. I want to provide evidence for when and for which tests there was a 'significant' change in behavior.
snapper2
age test prefer avoid 1 1 LR 15 14 2 1 SD 15 13 3 1 SG 17 14 4 1 SW 14 14 5 2 LR 17 14 6 2 SD 16 19 7 2 SG 20 10 8 2 SW 15 21 9 3 LR 10 16 10 3 SD 14 10 11 3 SG 14 9 12 3 SW 13 15 13 4 LR 12 11 14 4 SD 14 11 15 4 SG 13 12 16 4 SW 11 14 17 5 LR 4 12 18 5 SD 8 8 19 5 SG 0 18 20 5 SW 10 6 21 6 LR 0 6 22 6 SD 3 4 23 6 SG 0 5 24 6 SW 5 3
dotplot(age~prefer/avoid,data=snapper2,group=snapper2$test,cex=1.5,pch=19,ylab="age",auto.key=list(space="right",title="Tests"))
out2 <- glm(cbind(prefer,avoid) ~ age*test, data=snapper2,
family=binomial)
summary(out2)
Call: glm(formula = cbind(prefer, avoid) ~ age * test, family = binomial, data = snapper2) Deviance Residuals: [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.899e-02 3.716e-01 0.186 0.8527
age2 1.252e-01 5.180e-01 0.242 0.8091
age3 -5.390e-01 5.483e-01 -0.983 0.3256
age4 1.802e-02 5.589e-01 0.032 0.9743
age5 -1.168e+00 6.866e-01 -1.701 0.0890 .
age6 -2.575e+01 9.348e+04 0.000 0.9998
testSD 7.411e-02 5.307e-01 0.140 0.8890
testSG 1.252e-01 5.180e-01 0.242 0.8091
testSW -6.899e-02 5.301e-01 -0.130 0.8964
age2:testSD -4.401e-01 7.260e-01 -0.606 0.5444
age3:testSD 7.324e-01 7.846e-01 0.933 0.3506
age4:testSD 8.004e-02 7.863e-01 0.102 0.9189
age5:testSD 1.024e+00 9.301e-01 1.102 0.2707
age6:testSD 2.532e+01 9.348e+04 0.000 0.9998
age2:testSG 3.738e-01 7.407e-01 0.505 0.6138
age3:testSG 7.867e-01 7.832e-01 1.004 0.3152
age4:testSG -1.321e-01 7.764e-01 -0.170 0.8649
age5:testSG -2.568e+01 8.768e+04 0.000 0.9998
age6:testSG 2.121e-02 1.334e+05 0.000 1.0000
age2:testSW -4.616e-01 7.249e-01 -0.637 0.5242
age3:testSW 3.959e-01 7.662e-01 0.517 0.6054
age4:testSW -2.592e-01 7.858e-01 -0.330 0.7415
age5:testSW 1.678e+00 9.386e-01 1.788 0.0737 .
age6:testSW 2.626e+01 9.348e+04 0.000 0.9998
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5.4908e+01 on 23 degrees of freedom Residual
deviance: 2.6113e-10 on 0 degrees of freedom
AIC: 122.73
Number of Fisher Scoring iterations: 23
anova(out2, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(prefer, avoid)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 23 54.908
age 5 11.235 18 43.673 0.0469115 *
test 3 1.593 15 42.079 0.6608887
age:test 15 42.079 0 0.000 0.0002185 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
cheers
Andy
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * *
*
Dit bericht en eventuele bijlagen geven enkel de visie van de
schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht
niet
bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of
the
writer and may not be regarded as stating an official position of
INBO,
as
long as the message is not confirmed by a duly signed document. -- Andrew Halford Ph.D Research Scientist (Kimberley Marine Parks)| Adjunct Research
Scientist
(Curtin University) Dept. Parks and Wildlife Western Australia Ph: +61 8 9219 9795 Mobile: +61 (0) 468 419 473 * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * *
*
Dit bericht en eventuele bijlagen geven enkel de visie van de
schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht
niet
bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of
the
writer and may not be regarded as stating an official position of
INBO,
as
long as the message is not confirmed by a duly signed document.
-- Andrew Halford Ph.D Research Scientist (Kimberley Marine Parks)| Adjunct Research
Scientist
(Curtin University)
Dept. Parks and Wildlife
Western Australia
Ph: +61 8 9219 9795
Mobile: +61 (0) 468 419 473
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- Gavin Simpson, PhD
--
Andrew Halford Ph.D
Research Scientist (Kimberley Marine Parks)| Adjunct Research Scientist
(Curtin University)
Dept. Parks and Wildlife
Western Australia
Ph: +61 8 9219 9795
Mobile: +61 (0) 468 419 473
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Andrew Halford Ph.D Research Scientist (Kimberley Marine Parks)| Adjunct Research Scientist (Curtin University) Dept. Parks and Wildlife Western Australia Ph: +61 8 9219 9795 Mobile: +61 (0) 468 419 473 [[alternative HTML version deleted]]