terminology for binomial regression - R-SIG-ecology

Sat, Mar 5, 2011 11:59 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20110305/d97827f0/attachment.pl>

Ben Bolker

Sat, Mar 5, 2011 12:31 PM #

On 11-03-05 02:59 PM, Matthew Forister wrote:

In my opinion, it would be reasonable to use 'logistic regression' to
mean any GLM (generalized linear model) with a logit link, although very
most probably with the binomial family. My impression is that people
most commonly use 'logistic regression' to mean a GLM with
*binary* data and a logit link and 'binomial regression' to denote
non-binary data, but I don't have any references.

I would suggest Gelman and Hill for this, but these are statements of
changes on the logit scale ("log-odds" is a synonym).  Unfortunately,
the interpretation in terms of probability outcomes depends on the
baseline probability.  Rules of thumb are:

 (1) for small (near zero) baseline probabilities, the logistic
resembles an exponential and so the interpretation of logit-scale and
log-scale coefficients are similar, i.e. for small changes they can be
interpreted as proportional changes.  For your example above, this would
correspond to a PROPORTIONAL decline of approximately 14% per year for a
species that was already fairly rare.  (More precisely a decline of
(1-exp(-0.14))=0.13.)  (I want to emphasize that this is a change
relative to the original frequency of the species.)

 (2) for baseline probabilities near 0.5, the rule of thumb is that the
change in probability of occurrence is about r/4, so if your species
were originally present in about half of the samples a coefficient of
-0.14 would correspond to a decline of about 3.5% per year (this is
absolute rather than proportional).

 (3) For baseline probabilities near 1.0 (common species), #1 applies
but this time to the probability of non-occurrence. For example, suppose
we have a species that occurs 95% of the time.

## transform to logit scale
 qlogis(0.95)  ## 2.944, call it approx 2.95
 plogis(2.95-0.14) ## 0.943

## compare this with the change in the original probability of
## non-occurrence (0.05), which *increases* by 14%
1-0.05*1.14  ## 0.943

Matthew Forister

Sat, Mar 5, 2011 4:52 PM #

Ben, thank you.  I did not realize the interpretation was dependent on the
baseline probabilities, but I think I get it now.  One follow up question...

Assume for minute that I'm not interested in converting those values into
statements of probability.  Rather, I'm interested in making comparisons
among species.  For example, a species with a value of -0.25 (for the
coefficient associated with years) is in more severe decline than a species
with a value of -0.14.

Empirically, this seems to work out just fine.  If you take a look at the
attached pdf, you'll see examples of the fit of the binomial regression
models.  The numbers on the outside are the years-coefficients.  Seems to me
that those numbers do a good job at indicating the rate of decline, even
though the starting frequencies are different for different species.

Am I making any mistake in thinking about comparisons among species based on
the years-coefficient like this?

thanks!
Matt

On Sat, Mar 5, 2011 at 12:31 PM, Ben Bolker <bbolker at gmail.com> wrote:

On 11-03-05 02:59 PM, Matthew Forister wrote:

Hi all,

I have been frustrated by what seems to me like inconsistent terminology
associated with binomial regression.  There are two questions I'd love to
have answered, below.

For context, I have been using glm with binomial error, logit link.  The
response variable is "successes and failures" -- the successes are the

days

on which a species is observed in a year, and the failures are days in

which

it is not observed.  So the code
is  glm(cbind(DaysPresent,DaysAbsent)~years,binomial).  I'm interested in
the coefficient associated with years as a way to express the decline in

the

number of days a species is observed over time.

Question:

(1) This probably seems silly, but is "logistic regression" the same as a
glm with binomial error?  This is where I have found some frustrating
inconsistency in the ecological literature.

  In my opinion, it would be reasonable to use 'logistic regression' to
mean any GLM (generalized linear model) with a logit link, although very
most probably with the binomial family. My impression is that people
most commonly use 'logistic regression' to mean a GLM with
*binary* data and a logit link and 'binomial regression' to denote
non-binary data, but I don't have any references.

(2) What's the most straightforward way to interpret the coefficients

from a

predictor variable in a model like the one specified above?  For example,

species in decline (observed in fewer days over time) will have a years
coefficient of -0.14.  I'd like a verbal interpretation of that number.
 Rather than give you my understanding, I'll just ask and hope someone

can

help me out!

  I would suggest Gelman and Hill for this, but these are statements of
changes on the logit scale ("log-odds" is a synonym).  Unfortunately,
the interpretation in terms of probability outcomes depends on the
baseline probability.  Rules of thumb are:

 (1) for small (near zero) baseline probabilities, the logistic
resembles an exponential and so the interpretation of logit-scale and
log-scale coefficients are similar, i.e. for small changes they can be
interpreted as proportional changes.  For your example above, this would
correspond to a PROPORTIONAL decline of approximately 14% per year for a
species that was already fairly rare.  (More precisely a decline of
(1-exp(-0.14))=0.13.)  (I want to emphasize that this is a change
relative to the original frequency of the species.)

 (2) for baseline probabilities near 0.5, the rule of thumb is that the
change in probability of occurrence is about r/4, so if your species
were originally present in about half of the samples a coefficient of
-0.14 would correspond to a decline of about 3.5% per year (this is
absolute rather than proportional).

 (3) For baseline probabilities near 1.0 (common species), #1 applies
but this time to the probability of non-occurrence. For example, suppose
we have a species that occurs 95% of the time.

## transform to logit scale
 qlogis(0.95)  ## 2.944, call it approx 2.95
 plogis(2.95-0.14) ## 0.943

## compare this with the change in the original probability of
## non-occurrence (0.05), which *increases* by 14%
1-0.05*1.14  ## 0.943

Matthew L Forister
Assistant Professor
Dept. of Biology / MS 314
1664 N. Virginia St.
University of Nevada, Reno
Reno, Nevada 89557
--