Skip to content

residuals in logistic regression model

2 messages · Martin Hecht, Thierry Onkelinx

#
Hi,

I have some questions concerning the residuals in a logistic regression 
model ( family = binomial(link = "logit") ).
Basically I am trying to reproduce the fixed error variance of 3.29 from 
simulated data (see code below).
My questions are:

[1] What is the difference between the residuals obtained with resid() 
vs. m$residuals in glm() ?
[2] Why doesn't any of these two residual variances equal the original 
value of 3.29?
[3] How can the residuals be transformed to be on the "original" scale 
(with their variance being 3.29)?
[4] The fitted() values seem to be probabilities, is that right?
[5] Why are the fitted() values and the residuals not on the same 
scale/metric?

thank you very much,
Martin


# install.packages("gtools")
library(gtools) # for inv.logit
# install.packages("lme4")
library(lme4)

set.seed(1234)

### data generation

# standard logistic errors
eps <- rlogis ( n <- 100000 , 0 , 1 )

# variance of errors is 3.29
var ( eps )
# [1] 3.293174

# regression without predictors
# latent y equals errors
ylat <- 0 + eps

# probabilities using inverse logit function
probs <- inv.logit ( ylat )

# generation of responses from bernoulli distribution
resp <- sapply ( probs , function ( prob ) rbinom ( 1, 1, prob ) )


### glm model

# logistic regression
m <- glm ( resp ~ 1 , family = binomial(link = "logit") )

# variance of resid()
var ( resid ( m ) )
# [1] 1.386302

# variance of residuals in the results object
var ( m$residuals )
# [1] 4.000061


### glmer model

# random groups to specify a "level 2" random effect
# (just for the purpose to be able to run glmer)
gr <- sample ( 1:10 , n , replace = TRUE )
d <- data.frame ( resp , gr )

# model
m2 <- glmer ( resp ~ 1 + (1|gr) , d , family = binomial(link = "logit") )

# variance of resid()
var ( resid ( m2 ) )
# [1] 1.386302
3 days later
#
Dear Martin,

You need to do some reading on residuals in the context of a
generalised linear model. There are not the same as those of a linear
model. Have a look at the formula of a glm. There is no residual terms
like there is one with a linear model.

Note that there are different ways to calculate residuals on a glm.
See ?residuals.glm

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey


2015-09-11 22:41 GMT+02:00 Martin Hecht <martin.hecht at hu-berlin.de>: