Hi,
I have some questions concerning the residuals in a logistic regression
model ( family = binomial(link = "logit") ).
Basically I am trying to reproduce the fixed error variance of 3.29 from
simulated data (see code below).
My questions are:
[1] What is the difference between the residuals obtained with resid()
vs. m$residuals in glm() ?
[2] Why doesn't any of these two residual variances equal the original
value of 3.29?
[3] How can the residuals be transformed to be on the "original" scale
(with their variance being 3.29)?
[4] The fitted() values seem to be probabilities, is that right?
[5] Why are the fitted() values and the residuals not on the same
scale/metric?
thank you very much,
Martin
# install.packages("gtools")
library(gtools) # for inv.logit
# install.packages("lme4")
library(lme4)
set.seed(1234)
### data generation
# standard logistic errors
eps <- rlogis ( n <- 100000 , 0 , 1 )
# variance of errors is 3.29
var ( eps )
# [1] 3.293174
# regression without predictors
# latent y equals errors
ylat <- 0 + eps
# probabilities using inverse logit function
probs <- inv.logit ( ylat )
# generation of responses from bernoulli distribution
resp <- sapply ( probs , function ( prob ) rbinom ( 1, 1, prob ) )
### glm model
# logistic regression
m <- glm ( resp ~ 1 , family = binomial(link = "logit") )
# variance of resid()
var ( resid ( m ) )
# [1] 1.386302
# variance of residuals in the results object
var ( m$residuals )
# [1] 4.000061
### glmer model
# random groups to specify a "level 2" random effect
# (just for the purpose to be able to run glmer)
gr <- sample ( 1:10 , n , replace = TRUE )
d <- data.frame ( resp , gr )
# model
m2 <- glmer ( resp ~ 1 + (1|gr) , d , family = binomial(link = "logit") )
# variance of resid()
var ( resid ( m2 ) )
# [1] 1.386302
residuals in logistic regression model
2 messages · Martin Hecht, Thierry Onkelinx
3 days later
Dear Martin, You need to do some reading on residuals in the context of a generalised linear model. There are not the same as those of a linear model. Have a look at the formula of a glm. There is no residual terms like there is one with a linear model. Note that there are different ways to calculate residuals on a glm. See ?residuals.glm Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-09-11 22:41 GMT+02:00 Martin Hecht <martin.hecht at hu-berlin.de>:
Hi,
I have some questions concerning the residuals in a logistic regression
model ( family = binomial(link = "logit") ).
Basically I am trying to reproduce the fixed error variance of 3.29 from
simulated data (see code below).
My questions are:
[1] What is the difference between the residuals obtained with resid() vs.
m$residuals in glm() ?
[2] Why doesn't any of these two residual variances equal the original value
of 3.29?
[3] How can the residuals be transformed to be on the "original" scale (with
their variance being 3.29)?
[4] The fitted() values seem to be probabilities, is that right?
[5] Why are the fitted() values and the residuals not on the same
scale/metric?
thank you very much,
Martin
# install.packages("gtools")
library(gtools) # for inv.logit
# install.packages("lme4")
library(lme4)
set.seed(1234)
### data generation
# standard logistic errors
eps <- rlogis ( n <- 100000 , 0 , 1 )
# variance of errors is 3.29
var ( eps )
# [1] 3.293174
# regression without predictors
# latent y equals errors
ylat <- 0 + eps
# probabilities using inverse logit function
probs <- inv.logit ( ylat )
# generation of responses from bernoulli distribution
resp <- sapply ( probs , function ( prob ) rbinom ( 1, 1, prob ) )
### glm model
# logistic regression
m <- glm ( resp ~ 1 , family = binomial(link = "logit") )
# variance of resid()
var ( resid ( m ) )
# [1] 1.386302
# variance of residuals in the results object
var ( m$residuals )
# [1] 4.000061
### glmer model
# random groups to specify a "level 2" random effect
# (just for the purpose to be able to run glmer)
gr <- sample ( 1:10 , n , replace = TRUE )
d <- data.frame ( resp , gr )
# model
m2 <- glmer ( resp ~ 1 + (1|gr) , d , family = binomial(link = "logit") )
# variance of resid()
var ( resid ( m2 ) )
# [1] 1.386302
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models