OT: (quasi-?) separation in a logistic GLM
On Tue, 2008-12-16 at 13:31 +0100, vito muggeo wrote:
dear Gavin, I do not know whether such comment may be still useful..
Very much so, Thank you.
Why are you unsure about quasi-separation? I think that it is quite evident in the plot
Unsure in the sense that I had been unable to ascertain what quasi-complete separation was ;-) I'm still not convinced about the quasi-separation issue though. The coefficients on the glm are large but the standard errors don't indicate anything much wrong. I tried brglm() in the package of the same name and this gave effectively the same coefficients and standard errors as glm() where I would have expected them to differ considerably if (quasi-)separation were an issue. I'm not very familiar with the approach behind brglm() however. I'll take a look at the profiling you describe below also when our computing problems here get sorted. Apologies if people have had problems downloading the file from my web space - we are having all sorts of filestore problems here this week. Thanks again Vito for your comments, G
plot(analogs ~ Dij, data = dat)
Also it may be useful to see the plot of the monotone (profile) deviance
(or the log-lik) for the coef of Dij,
xval<-seq(-20,0,l=50)
ll<-vector(length=50)
for(i in 1:length(xval)){
mod <- glm(analogs ~ offset(xval[i]*Dij), data = dat, family = binomial)
ll[i]<-mod$dev
}
plot(xval, ll)
Hope this helps you,
vito
Gavin Simpson ha scritto:
Dear List,
Apologies for this off-topic post but it is R-related in the sense that
I am trying to understand what R is telling me with the data to hand.
ROC curves have recently been used to determine a dissimilarity
threshold for identifying whether two samples are from the same "type"
or not. Given the bashing that ROC curves get whenever anyone asks about
them on this list (and having implemented the ROC methodology in my
analogue package) I wanted to try directly modelling the probability
that two sites are analogues for one another for given dissimilarity
using glm().
The data I have then are a logical vector ('analogs') indicating whether
the two sites come from the same vegetation and a vector of the
dissimilarity between the two sites ('Dij'). These are in a csv file
currently in my university web space. Each 'row' in this file
corresponds to single comparison between 2 sites.
When I analyse these data using glm() I get the familiar "fitted
probabilities numerically 0 or 1 occurred" warning. The data do not look
linearly separable when plotted (code for which is below). I have read
Venables and Ripley's discussion of this in MASS4 and other sources that
discuss this warning and R (Faraway's Extending the Linear Model with R
and John Fox's new Applied Regression, Generalized Linear Models, and
Related Methods, 2nd Ed) as well as some of the literature on Firth's
bias reduction method. But I am still somewhat unsure what
(quasi-)separation is and if this is the reason for the warnings in this
case.
My question then is, is this a separation issue with my data, or is it
quasi-separation that I have read a bit about whilst researching this
problem? Or is this something completely different?
Code to reproduce my problem with the actual data is given below. I'd
appreciate any comments or thoughts on this.
#### Begin code snippet ################################################
## note data file is ~93Kb in size
dat <- read.csv(url("http://www.homepages.ucl.ac.uk/~ucfagls/dat.csv"))
head(dat)
## fit model --- produces warning
mod <- glm(analogs ~ Dij, data = dat, family = binomial)
## plot the data
plot(analogs ~ Dij, data = dat)
fit.mod <- fitted(mod)
ord <- with(dat, order(Dij))
with(dat, lines(Dij[ord], fit.mod[ord], col = "red", lwd = 2))
#### End code snippet ##################################################
Thanks in advance
Gavin
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%