Generally speaking, this list is about questions on R programming, not
statistical issues. However, I grant you that your queries are in something
of a gray area intersecting both.
Nevertheless, based on your admitted confusion, I would recommend that you
find a local statistical expert with whom you can consult 1-1 if at all
possible. As others have already noted, you statistical understanding is
muddy, and it can be quite difficult to resolve such confusion in online
forums like this that cannot provide the close back and forth that may be
required (as well as further appropriate study).
Best,
Bert
On Mon, Nov 12, 2018 at 11:09 AM Frodo Jedi <
frodojedi.mailinglist at gmail.com> wrote:
Dear Peter and Eik,
I am very grateful to you for your replies.
My current understanding is that from the GLM analysis I can indeed
conclude that the response predicted by System A is significantly
different
from that of System B, while the pairwise comparison A vs C leads to non
significance. Now the Wald test seems to be correct only for Systems B vs
C, indicating that the pairwise System B vs System C is significant. Am I
correct?
However, my current understanding is also that I should use contrasts
instead of the wald test. So the default contrasts is with the System A,
now I should re-perform the GLM with another base. I tried to use the
option "contrasts" of the glm:
fit1 <- glm(Response ~ System, data = scrd, family = "binomial",
contrasts = contr.treatment(3, base=1,contrasts=TRUE))
fit2 <- glm(Response ~ System, data = scrd, family = "binomial",
contrasts = contr.treatment(3, base=2,contrasts=TRUE))
fit3 <- glm(Response ~ System, data = scrd, family = "binomial",
contrasts = contr.treatment(3, base=3,contrasts=TRUE))
library(multcomp)
summary(glht(fit, mcp(System="Tukey")))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: glm(formula = Response ~ System, family = "binomial", data = scrd)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
B - A == 0 -1.2715 0.3379 -3.763 0.000445 ***
C - A == 0 0.8588 0.4990 1.721 0.192472
C - B == 0 2.1303 0.4512 4.722 < 1e-04 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Adjusted p values reported -- single-step method)
Is this Tukey analysis correct?
I am a bit confused on what analysis I should do. I am doing my very best
to study all resources I can find, but I would really need some help from
experts, especially in using R.
Best wishes
FJ
On Mon, Nov 12, 2018 at 1:46 PM peter dalgaard <pdalgd at gmail.com> wrote:
Yes, only one of the pairwise comparisons (B vs. C) is right. Also, the
overall test has 3 degrees of freedom whereas a comparison of 3 groups
should have 2. You (meaning Frodo) are testing that _all 3_ regression
coefficients are zero, intercept included. That would imply that all
systems have response probablilities og 0.5, which is not likely what
want.
This all suggests that you are struggling with the interpretation of the
regression coefficients and their role in the linear predictor. This
be covered by any good book on logistic regression.
-pd
On 12 Nov 2018, at 14:15 , Eik Vettorazzi <E.Vettorazzi at uke.de>
Dear Jedi,
please use the source carefully. A and C are not statistically
at the 5% level, which can be inferred from glm output. Your last two
wald.tests don't test what you want to, since your model contains an
intercept term. You specified contrasts which tests A vs B-A, ie A-
(B-A)==0 <-> 2*A-B==0 which is not intended I think. Have a look at
?contr.treatment and re-read your source doc to get an idea what dummy
coding and indicatr variables are about.
Cheers
Am 12.11.2018 um 02:07 schrieb Frodo Jedi:
Dear list members,
I need some help in understanding whether I am doing correctly a
logistic regression and whether I am interpreting the results in the
correct way. Also I would need an advice regarding the reporting of
results from the R functions.
I want to report the results of a binomial logistic regression where
to assess difference between the 3 levels of a factor (called
the dependent variable (called Response) taking two values, 0 and 1.
goal is to understand if the effect of the 3 systems (A,B,C) in
affect differently Response in a significant way. I am basing my
fit <- glm(Response ~ System, data = scrd, family = "binomial")
summary(fit)
Call:
glm(formula = Response ~ System, family = "binomial", data = scrd)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8840 0.1775 0.2712 0.2712 0.5008
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.2844 0.2825 11.626 < 2e-16 ***
SystemB -1.2715 0.3379 -3.763 0.000168 ***
SystemC 0.8588 0.4990 1.721 0.085266 .
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 411.26 on 1023 degrees of freedom
Residual deviance: 376.76 on 1021 degrees of freedom
AIC: 382.76
Number of Fisher Scoring iterations: 6
Following this analysis I perform the wald test in order to
whether there is an overall effect of System:
library(aod)
wald.test(b = coef(fit), Sigma = vcov(fit), Terms = 1:3)
Wald test:
----------
Chi-squared test:
X2 = 354.6, df = 3, P(> X2) = 0.0
The chi-squared test statistic of 354.6, with 3 degrees of freedom is
associated with a p-value < 0.001 indicating that the overall effect
System is statistically significant.
Now I check whether there are differences between the coefficients
again the wald test:
# Here difference between system B and C:
l <- cbind(0, 1, -1)
wald.test(b = coef(fit), Sigma = vcov(fit), L = l)
Wald test:
----------
Chi-squared test:
X2 = 22.3, df = 1, P(> X2) = 2.3e-06
# Here difference between system A and C:
l <- cbind(1, 0, -1)
wald.test(b = coef(fit), Sigma = vcov(fit), L = l)
Wald test:
----------
Chi-squared test:
X2 = 12.0, df = 1, P(> X2) = 0.00052
# Here difference between system A and B:
l <- cbind(1, -1, 0)
wald.test(b = coef(fit), Sigma = vcov(fit), L = l)
Wald test:
----------
Chi-squared test:
X2 = 58.7, df = 1, P(> X2) = 1.8e-14
My understanding is that from this analysis I can state that the
systems lead to a significantly different Response. Am I right? If
should I report the results of this analysis? What is the correct
Thanks in advance
Best wishes
FJ
[[alternative HTML version deleted]]