Skip to content

Offsets in Poisson or Neg. Bin regression

7 messages · Highland Statistics Ltd, Scott Foster, Ivailo

#
Matias,
The only problem with the offset is that you implicitly assume:

double the number of embryos per individual (FECUND)  ==> double the 
expected value of damaged embryos per individual

This simply follows from the equation that Philip wrote down. For some 
scenarios this makes sense, but not for other scenarios.

Alain

  
    
2 days later
#
On Tue, Jun 18, 2013 at 11:10 AM, Matias Ledesma <matutetote at hotmail.com> wrote:
As I'm facing a similar problem, I'd like to know as well if a
variable should be passed as an offset to the formula only when it
influences the outcome in some (linear) way. Does it make sense to
include the exposure variable in the model as a regular input first,
and if it's coefficient is around 1 to be taken as an indicator that
it is better that variable to be included in the model as an offset?

Cheers,
Ivailo
--
UBUNTU: a person is a person through other persons.
6 days later
#
Hi Ivailo,

Good question.  Difficult to answer, which is probably why you haven't 
had any responses yet (that the list has seen).

If you include an offset term with a log link function then you are 
assuming that the random variable (counts say) depend on the offset with 
a known relationship.  Generally, this is precisely what you want to do 
-- for example standardising counts for the sampling effort taken to 
obtain those counts.

However, in some situations it is conceivable that the sampling effort 
itself affects the count random variable.  An example may be fish in a 
trawl net -- as the net gets full it becomes less and less efficacious.  
In this case you may expect that a single unit of effort change will 
have different effect when there has been lots of previous effort to 
when there hasn't.

If I thought that I was in the latter case, I may fit a model like

log( E( count)) = log( effort) + f(effort) + other stuff.

The function f(effort) can take any form, including beta*log(effort).  
In such a case a test of beta==0 is equivalent to testing if the effect 
of effort is purely scaling or if it is something else/sinister.  
General forms of f(effort) may tell you much more but may also be much 
more confusing.

To choose between the two cases above (offset versus offset+covariate), 
I would base my choice largely on prior knowledge of the system under 
study.  This is especially so if I don't have much data.

I hope that this has helped,

Scott

PS Is it just me or did the original question (damaged embryos with 
offset of number of embryos) sound more like a binomial problem than a 
Poisson/NB one?  Note though that they will start to coincide if the 
number of embryos is large and the probability of damage is small 
(Binomial -> Poisson in the limit).
On 19/06/13 20:53, Ivailo wrote:

  
    
#
On Tue, Jun 25, 2013 at 2:02 PM, Scott Foster <scott.foster at csiro.au> wrote:
Thanks for commenting on that, Scott!

Although both alternatives you mention above assume that the RV
depends on either the "offset" or the "sampling effort", but aren't
these are essentially the same?
My approach to modeling counts was primarily based on the widespread
advise that varying effort should be considered by adding an offset to
the model, but when I consulted the book by McCullagh and Nelder
(1989), I found on pp. 206-207 hat they actually estimated the
log(effort) term as being ~ 1. So started my confusion on the topic
"to offset or to estimate" ;-)

It never occurred to me, though, that the effort could be entered both
as an offset *and* as a covariate into the model. As these two terms
have good chances being collinear, I wonder how one can then separate
their influence on the RV. I do not fully understand your idea
regarding the form of the function "f(effort) ", but I get that if the
coefficient of effort is estimated as == 0, then it should be
concluded that effect of effort should be retained *only* as an offset
to account for the "scaling". Am I right?

Thanks again for your elucidating comment,
Ivailo
--
UBUNTU: a person is a person through other persons.
#
Hi again Ivailo,

Yes, the `offset' and the covariate are the same thing.  Including them 
both simply alters the functional form of the linear predictor in your 
model.  No, they are not collinear in the typical sense as there is only 
one parameter (linear form) between them -- the offset term does not 
have a parameter that will be estimated associated with it.  For 
example, with log( effort) added as a linear covariate the log-link GLM is

log( E(y)) = offset + beta * log( effort) + other_stuff = log( effort) + 
beta * log( effort) + other_stuff = beta_1 * log( effort) + other_stuff
where beta_1=1+beta.

If you test that beta==0 (which is not beta_1) then you are testing that 
the effect of effect is purely scaling (as per nomenclature before).  
This is the same as McCullagh and Nelder's testing to see if beta_1==1.  
Thanks for the pointer to McCullagh and Nelder -- I didn't know that 
they suggested that.

My depiction of the effect of effort as f( effort) is to allow for the 
possibility that the effect of effort may be non-linear on the link 
scale.  A simple example is when f(effort) is a low-order polynomial.  
Departures from effort being a purely scaling term may extend beyond 
linearity.  One may even want to consider regression splines or even 
more flexible GAMs.

Having said all this though, it is my practice to be quite conservative 
with including effort as anything but a scaling variable (offset).  It 
seems to me that there needs to be good reason before jumping to strong 
conclusions that may have no basis in the phenomenon under study.

Hope this helps,

Scott
On 26/06/13 18:14, Ivailo wrote:

  
    
#
On Wed, Jun 26, 2013 at 12:42 PM, Scott Foster <scott.foster at csiro.au> wrote:
Thanks a lot for the brilliant explanation, Scott! Now things make
sense to me, and I'm interested what the modeling strategy would be if
beta_1 turns out to be significantly <> 1. Would the option you
mention below be viable alternative in that case?
I imagine that the fishing-net example you mentioned earlier could be
a case of a non-linear effect of effort -- wouldn't this warrant
modeling the effort as being non-linear on the link scale?

Cheers,
Ivailo
--
UBUNTU: a person is a person through other persons.
#
Hi Ivailo,

If the effort term is not just present in the model for the purpose of 
scaling the outcome random variable, then I think that it should just be 
treated as a regression-type problem.  All the questions your raised 
seem(?) to be standard in that setting too: Is the covariate acting 
linearly (on the link scale)?  Are any non-linearities (on the link 
scale) important enough to warrant using some curvi-linear or 
basis-expanded function of the effort variable?  And so on...

Yes, the fishing net example *may* be one where the (scaling) effort 
variable acts non-linearly.  I have not thought about this though. I 
typically use effort as a scaling factor only as I have a strong a 
priori belief that effort will be multiplicatively related to expected 
outcome (log offset with log-link).  I am sure that I will need to 
revise this belief sometime;-)

Scott
On 27/06/13 16:57, Ivailo wrote: