Skip to content

Confidence intervals for predicted probabilities in logistic regression

2 messages · Manuel Spínola, Kingsford Jones

#
Dear list members,

I am fitting a logistic regression and I want to calculate the predicted 
probabilities of a response ("positive" and "negative") for a model with 
one categorical predictor (2 levels, "yes" and "no").

oso.3 = glm(response ~ school, data=osologistic, 
family=binomial(link="logit"))

For the reference level (the intercept, level "no") the point estimate is:,

 > plogis(coef(oso.3)[1] + coef(oso.3)[2]*0)
(Intercept)
      0.13

To calculate the confidence interval (lower and upper limit) I am using 
the formula: beta0+beta1*x-1.96*se(beta0+beta1*x) and 
beta0+beta1*x+1.96*se(beta0+beta1*x), using the variance and covariance 
to calculate the standard error:

 > errore.no = sqrt(1.14 + 1.73*02 + 2*-1.14*0)

 > plogis(coef(oso.3)[1] + coef(oso.3)[2]*0-1.96*errore.no)
(Intercept)
     0.017

 > plogis(coef(oso.3)[1] + coef(oso.3)[2]*0+1.96*errore.no)
(Intercept)
      0.54

However, when I use the confint function I got:

 > plogis(confint(oso.3))
Waiting for profiling to be done...
            2.5 % 97.5 %
(Intercept) 0.0076   0.45
schoolyes   0.8025   1.00

Here, I am looking at only to the intercept that is the reference level, 
"no").
I was expecting to get the same lower and upper limit for the intercept.
Do you know why the results are different?
Do I need the covariance to calculate the se?
Thank you very much in advance.
Best,

Manuel
#
Hi Manuel,

Providing self contained code would make the question easier to
answer, but I'll have a go.  Comments below
On Tue, Jun 23, 2009 at 5:03 AM, Manuel Sp?nola<mspinola10 at gmail.com> wrote:
This describes an interval for a prediction on the logit scale
Why are you multiplying the 2nd term in the sum above by 2?  I'm
guessing that was not intended.
Here you seem to want an interval on the response scale.
These are profiled confidence intervals for the parameters on the
linear (logit) scale.



Here's a method of calculating asymptotic intervals for the
predictions on the response or logit scales:

set.seed(314)
n <- 100
x <- sample(0:1, n, repl=TRUE, prob=1:2)
linpred <- -1 + 4*x
response <- rbinom(n, 1, plogis(linpred))
school <- as.factor(c('yes','no')[x+1])
oso.3 <- glm(response ~ school, family='binomial')
summary(oso.3)

pred <- predict(oso.3, data.frame(school=c('yes','no')),
  se.fit=TRUE) # type = 'link' by default


with(pred, cbind(fit,
  low = fit - 1.96*se.fit,
  up = fit + 1.96*se.fit)
)

pred <- predict(oso.3, data.frame(school=c('yes','no')),
  type='response', se.fit=TRUE)

with(pred, cbind(fit,
  low = fit - 1.96*se.fit,
  up = fit + 1.96*se.fit)
)


hth,

Kingsford Jones