Offsets in Poisson or Neg. Bin regression

7 messages · Highland Statistics Ltd, Scott Foster, Ivailo

Original

1

7

Highland Statistics Ltd

Sun, Jun 16, 2013 11:35 PM #

Matias,
The only problem with the offset is that you implicitly assume:

double the number of embryos per individual (FECUND)  ==> double the 
expected value of damaged embryos per individual

This simply follows from the equation that Philip wrote down. For some 
scenarios this makes sense, but not for other scenarios.

Alain

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


End of R-sig-ecology Digest, Vol 63, Issue 12
*********************************************

Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno.
http://www.highstat.com/book4.htm

Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com

2 days later

Ivailo

Wed, Jun 19, 2013 3:53 AM #

On Tue, Jun 18, 2013 at 11:10 AM, Matias Ledesma <matutetote at hotmail.com> wrote:

As I'm facing a similar problem, I'd like to know as well if a
variable should be passed as an offset to the formula only when it
influences the outcome in some (linear) way. Does it make sense to
include the exposure variable in the model as a regular input first,
and if it's coefficient is around 1 to be taken as an indicator that
it is better that variable to be included in the model as an offset?

Cheers,
Ivailo
--
UBUNTU: a person is a person through other persons.

6 days later

Scott Foster

Tue, Jun 25, 2013 4:02 AM #

Hi Ivailo,

Good question.  Difficult to answer, which is probably why you haven't 
had any responses yet (that the list has seen).

If you include an offset term with a log link function then you are 
assuming that the random variable (counts say) depend on the offset with 
a known relationship.  Generally, this is precisely what you want to do 
-- for example standardising counts for the sampling effort taken to 
obtain those counts.

However, in some situations it is conceivable that the sampling effort 
itself affects the count random variable.  An example may be fish in a 
trawl net -- as the net gets full it becomes less and less efficacious.  
In this case you may expect that a single unit of effort change will 
have different effect when there has been lots of previous effort to 
when there hasn't.

If I thought that I was in the latter case, I may fit a model like

log( E( count)) = log( effort) + f(effort) + other stuff.

The function f(effort) can take any form, including beta*log(effort).  
In such a case a test of beta==0 is equivalent to testing if the effect 
of effort is purely scaling or if it is something else/sinister.  
General forms of f(effort) may tell you much more but may also be much 
more confusing.

To choose between the two cases above (offset versus offset+covariate), 
I would base my choice largely on prior knowledge of the system under 
study.  This is especially so if I don't have much data.

I hope that this has helped,

Scott

PS Is it just me or did the original question (damaged embryos with 
offset of number of embryos) sound more like a binomial problem than a 
Poisson/NB one?  Note though that they will start to coincide if the 
number of embryos is large and the probability of damage is small 
(Binomial -> Poisson in the limit).

On 19/06/13 20:53, Ivailo wrote:

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au

Ivailo

Wed, Jun 26, 2013 1:14 AM #

On Tue, Jun 25, 2013 at 2:02 PM, Scott Foster <scott.foster at csiro.au> wrote:

Thanks for commenting on that, Scott!

Although both alternatives you mention above assume that the RV
depends on either the "offset" or the "sampling effort", but aren't
these are essentially the same?

My approach to modeling counts was primarily based on the widespread
advise that varying effort should be considered by adding an offset to
the model, but when I consulted the book by McCullagh and Nelder
(1989), I found on pp. 206-207 hat they actually estimated the
log(effort) term as being ~ 1. So started my confusion on the topic
"to offset or to estimate" ;-)

It never occurred to me, though, that the effort could be entered both
as an offset *and* as a covariate into the model. As these two terms
have good chances being collinear, I wonder how one can then separate
their influence on the RV. I do not fully understand your idea
regarding the form of the function "f(effort) ", but I get that if the
coefficient of effort is estimated as == 0, then it should be
concluded that effect of effort should be retained *only* as an offset
to account for the "scaling". Am I right?

Thanks again for your elucidating comment,
Ivailo
--
UBUNTU: a person is a person through other persons.

Scott Foster

Wed, Jun 26, 2013 2:42 AM #

Hi again Ivailo,

Yes, the `offset' and the covariate are the same thing.  Including them 
both simply alters the functional form of the linear predictor in your 
model.  No, they are not collinear in the typical sense as there is only 
one parameter (linear form) between them -- the offset term does not 
have a parameter that will be estimated associated with it.  For 
example, with log( effort) added as a linear covariate the log-link GLM is

log( E(y)) = offset + beta * log( effort) + other_stuff = log( effort) + 
beta * log( effort) + other_stuff = beta_1 * log( effort) + other_stuff
where beta_1=1+beta.

If you test that beta==0 (which is not beta_1) then you are testing that 
the effect of effect is purely scaling (as per nomenclature before).  
This is the same as McCullagh and Nelder's testing to see if beta_1==1.  
Thanks for the pointer to McCullagh and Nelder -- I didn't know that 
they suggested that.

My depiction of the effect of effort as f( effort) is to allow for the 
possibility that the effect of effort may be non-linear on the link 
scale.  A simple example is when f(effort) is a low-order polynomial.  
Departures from effort being a purely scaling term may extend beyond 
linearity.  One may even want to consider regression splines or even 
more flexible GAMs.

Having said all this though, it is my practice to be quite conservative 
with including effort as anything but a scaling variable (offset).  It 
seems to me that there needs to be good reason before jumping to strong 
conclusions that may have no basis in the phenomenon under study.

Hope this helps,

Scott

On 26/06/13 18:14, Ivailo wrote:

On Tue, Jun 25, 2013 at 2:02 PM, Scott Foster <scott.foster at csiro.au> wrote:

Hi Ivailo,

Good question.  Difficult to answer, which is probably why you haven't had
any responses yet (that the list has seen).

If you include an offset term with a log link function then you are assuming
that the random variable (counts say) depend on the offset with a known
relationship.  Generally, this is precisely what you want to do -- for
example standardising counts for the sampling effort taken to obtain those
counts.

However, in some situations it is conceivable that the sampling effort
itself affects the count random variable.  An example may be fish in a trawl
net -- as the net gets full it becomes less and less efficacious.  In this
case you may expect that a single unit of effort change will have different
effect when there has been lots of previous effort to when there hasn't.

Thanks for commenting on that, Scott!

Although both alternatives you mention above assume that the RV
depends on either the "offset" or the "sampling effort", but aren't
these are essentially the same?

If I thought that I was in the latter case, I may fit a model like

log( E( count)) = log( effort) + f(effort) + other stuff.

The function f(effort) can take any form, including beta*log(effort).  In
such a case a test of beta==0 is equivalent to testing if the effect of
effort is purely scaling or if it is something else/sinister.  General forms
of f(effort) may tell you much more but may also be much more confusing.

To choose between the two cases above (offset versus offset+covariate), I
would base my choice largely on prior knowledge of the system under study.
This is especially so if I don't have much data.

My approach to modeling counts was primarily based on the widespread
advise that varying effort should be considered by adding an offset to
the model, but when I consulted the book by McCullagh and Nelder
(1989), I found on pp. 206-207 hat they actually estimated the
log(effort) term as being ~ 1. So started my confusion on the topic
"to offset or to estimate" ;-)

It never occurred to me, though, that the effort could be entered both
as an offset *and* as a covariate into the model. As these two terms
have good chances being collinear, I wonder how one can then separate
their influence on the RV. I do not fully understand your idea
regarding the form of the function "f(effort) ", but I get that if the
coefficient of effort is estimated as == 0, then it should be
concluded that effect of effort should be retained *only* as an offset
to account for the "scaling". Am I right?

Thanks again for your elucidating comment,
Ivailo
--
UBUNTU: a person is a person through other persons.

Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au

Ivailo

Wed, Jun 26, 2013 11:57 PM #

On Wed, Jun 26, 2013 at 12:42 PM, Scott Foster <scott.foster at csiro.au> wrote:

Thanks a lot for the brilliant explanation, Scott! Now things make
sense to me, and I'm interested what the modeling strategy would be if
beta_1 turns out to be significantly <> 1. Would the option you
mention below be viable alternative in that case?

I imagine that the fishing-net example you mentioned earlier could be
a case of a non-linear effect of effort -- wouldn't this warrant
modeling the effort as being non-linear on the link scale?

Cheers,
Ivailo
--
UBUNTU: a person is a person through other persons.

Scott Foster

Thu, Jun 27, 2013 2:02 AM #

Hi Ivailo,

If the effort term is not just present in the model for the purpose of 
scaling the outcome random variable, then I think that it should just be 
treated as a regression-type problem.  All the questions your raised 
seem(?) to be standard in that setting too: Is the covariate acting 
linearly (on the link scale)?  Are any non-linearities (on the link 
scale) important enough to warrant using some curvi-linear or 
basis-expanded function of the effort variable?  And so on...

Yes, the fishing net example *may* be one where the (scaling) effort 
variable acts non-linearly.  I have not thought about this though. I 
typically use effort as a scaling factor only as I have a strong a 
priori belief that effort will be multiplicatively related to expected 
outcome (log offset with log-link).  I am sure that I will need to 
revise this belief sometime;-)

Scott

On 27/06/13 16:57, Ivailo wrote:

On Wed, Jun 26, 2013 at 12:42 PM, Scott Foster <scott.foster at csiro.au> wrote:

Hi again Ivailo,

Yes, the `offset' and the covariate are the same thing.  Including them both
simply alters the functional form of the linear predictor in your model.
No, they are not collinear in the typical sense as there is only one
parameter (linear form) between them -- the offset term does not have a
parameter that will be estimated associated with it.  For example, with log(
effort) added as a linear covariate the log-link GLM is

log( E(y)) = offset + beta * log( effort) + other_stuff = log( effort) +
beta * log( effort) + other_stuff = beta_1 * log( effort) + other_stuff
where beta_1=1+beta.

If you test that beta==0 (which is not beta_1) then you are testing that the
effect of effect is purely scaling (as per nomenclature before).  This is
the same as McCullagh and Nelder's testing to see if beta_1==1.  Thanks for
the pointer to McCullagh and Nelder -- I didn't know that they suggested
that.

Thanks a lot for the brilliant explanation, Scott! Now things make
sense to me, and I'm interested what the modeling strategy would be if
beta_1 turns out to be significantly <> 1. Would the option you
mention below be viable alternative in that case?

My depiction of the effect of effort as f( effort) is to allow for the
possibility that the effect of effort may be non-linear on the link scale.
A simple example is when f(effort) is a low-order polynomial.  Departures
from effort being a purely scaling term may extend beyond linearity.  One
may even want to consider regression splines or even more flexible GAMs.
Having said all this though, it is my practice to be quite conservative with
including effort as anything but a scaling variable (offset).  It seems to
me that there needs to be good reason before jumping to strong conclusions
that may have no basis in the phenomenon under study.

I imagine that the fishing-net example you mentioned earlier could be
a case of a non-linear effect of effort -- wouldn't this warrant
modeling the effort as being non-linear on the link scale?

Cheers,
Ivailo
--
UBUNTU: a person is a person through other persons.

Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au