An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20110305/d97827f0/attachment.pl>
terminology for binomial regression
3 messages · Matthew Forister, Ben Bolker
On 11-03-05 02:59 PM, Matthew Forister wrote:
Hi all, I have been frustrated by what seems to me like inconsistent terminology associated with binomial regression. There are two questions I'd love to have answered, below. For context, I have been using glm with binomial error, logit link. The response variable is "successes and failures" -- the successes are the days on which a species is observed in a year, and the failures are days in which it is not observed. So the code is glm(cbind(DaysPresent,DaysAbsent)~years,binomial). I'm interested in the coefficient associated with years as a way to express the decline in the number of days a species is observed over time. Question: (1) This probably seems silly, but is "logistic regression" the same as a glm with binomial error? This is where I have found some frustrating inconsistency in the ecological literature.
In my opinion, it would be reasonable to use 'logistic regression' to mean any GLM (generalized linear model) with a logit link, although very most probably with the binomial family. My impression is that people most commonly use 'logistic regression' to mean a GLM with *binary* data and a logit link and 'binomial regression' to denote non-binary data, but I don't have any references.
(2) What's the most straightforward way to interpret the coefficients from a predictor variable in a model like the one specified above? For example, a species in decline (observed in fewer days over time) will have a years coefficient of -0.14. I'd like a verbal interpretation of that number. Rather than give you my understanding, I'll just ask and hope someone can help me out!
I would suggest Gelman and Hill for this, but these are statements of
changes on the logit scale ("log-odds" is a synonym). Unfortunately,
the interpretation in terms of probability outcomes depends on the
baseline probability. Rules of thumb are:
(1) for small (near zero) baseline probabilities, the logistic
resembles an exponential and so the interpretation of logit-scale and
log-scale coefficients are similar, i.e. for small changes they can be
interpreted as proportional changes. For your example above, this would
correspond to a PROPORTIONAL decline of approximately 14% per year for a
species that was already fairly rare. (More precisely a decline of
(1-exp(-0.14))=0.13.) (I want to emphasize that this is a change
relative to the original frequency of the species.)
(2) for baseline probabilities near 0.5, the rule of thumb is that the
change in probability of occurrence is about r/4, so if your species
were originally present in about half of the samples a coefficient of
-0.14 would correspond to a decline of about 3.5% per year (this is
absolute rather than proportional).
(3) For baseline probabilities near 1.0 (common species), #1 applies
but this time to the probability of non-occurrence. For example, suppose
we have a species that occurs 95% of the time.
## transform to logit scale
qlogis(0.95) ## 2.944, call it approx 2.95
plogis(2.95-0.14) ## 0.943
## compare this with the change in the original probability of
## non-occurrence (0.05), which *increases* by 14%
1-0.05*1.14 ## 0.943
Ben, thank you. I did not realize the interpretation was dependent on the baseline probabilities, but I think I get it now. One follow up question... Assume for minute that I'm not interested in converting those values into statements of probability. Rather, I'm interested in making comparisons among species. For example, a species with a value of -0.25 (for the coefficient associated with years) is in more severe decline than a species with a value of -0.14. Empirically, this seems to work out just fine. If you take a look at the attached pdf, you'll see examples of the fit of the binomial regression models. The numbers on the outside are the years-coefficients. Seems to me that those numbers do a good job at indicating the rate of decline, even though the starting frequencies are different for different species. Am I making any mistake in thinking about comparisons among species based on the years-coefficient like this? thanks! Matt
On Sat, Mar 5, 2011 at 12:31 PM, Ben Bolker <bbolker at gmail.com> wrote:
On 11-03-05 02:59 PM, Matthew Forister wrote:
Hi all, I have been frustrated by what seems to me like inconsistent terminology associated with binomial regression. There are two questions I'd love to have answered, below. For context, I have been using glm with binomial error, logit link. The response variable is "successes and failures" -- the successes are the
days
on which a species is observed in a year, and the failures are days in
which
it is not observed. So the code is glm(cbind(DaysPresent,DaysAbsent)~years,binomial). I'm interested in the coefficient associated with years as a way to express the decline in
the
number of days a species is observed over time. Question: (1) This probably seems silly, but is "logistic regression" the same as a glm with binomial error? This is where I have found some frustrating inconsistency in the ecological literature.
In my opinion, it would be reasonable to use 'logistic regression' to mean any GLM (generalized linear model) with a logit link, although very most probably with the binomial family. My impression is that people most commonly use 'logistic regression' to mean a GLM with *binary* data and a logit link and 'binomial regression' to denote non-binary data, but I don't have any references.
(2) What's the most straightforward way to interpret the coefficients
from a
predictor variable in a model like the one specified above? For example,
a
species in decline (observed in fewer days over time) will have a years coefficient of -0.14. I'd like a verbal interpretation of that number. Rather than give you my understanding, I'll just ask and hope someone
can
help me out!
I would suggest Gelman and Hill for this, but these are statements of
changes on the logit scale ("log-odds" is a synonym). Unfortunately,
the interpretation in terms of probability outcomes depends on the
baseline probability. Rules of thumb are:
(1) for small (near zero) baseline probabilities, the logistic
resembles an exponential and so the interpretation of logit-scale and
log-scale coefficients are similar, i.e. for small changes they can be
interpreted as proportional changes. For your example above, this would
correspond to a PROPORTIONAL decline of approximately 14% per year for a
species that was already fairly rare. (More precisely a decline of
(1-exp(-0.14))=0.13.) (I want to emphasize that this is a change
relative to the original frequency of the species.)
(2) for baseline probabilities near 0.5, the rule of thumb is that the
change in probability of occurrence is about r/4, so if your species
were originally present in about half of the samples a coefficient of
-0.14 would correspond to a decline of about 3.5% per year (this is
absolute rather than proportional).
(3) For baseline probabilities near 1.0 (common species), #1 applies
but this time to the probability of non-occurrence. For example, suppose
we have a species that occurs 95% of the time.
## transform to logit scale
qlogis(0.95) ## 2.944, call it approx 2.95
plogis(2.95-0.14) ## 0.943
## compare this with the change in the original probability of
## non-occurrence (0.05), which *increases* by 14%
1-0.05*1.14 ## 0.943
Matthew L Forister Assistant Professor Dept. of Biology / MS 314 1664 N. Virginia St. University of Nevada, Reno Reno, Nevada 89557 --