On Tue, Jun 25, 2013 at 2:02 PM, Scott Foster <scott.foster at csiro.au> wrote:
Hi Ivailo,
Good question. Difficult to answer, which is probably why you haven't had
any responses yet (that the list has seen).
If you include an offset term with a log link function then you are assuming
that the random variable (counts say) depend on the offset with a known
relationship. Generally, this is precisely what you want to do -- for
example standardising counts for the sampling effort taken to obtain those
counts.
However, in some situations it is conceivable that the sampling effort
itself affects the count random variable. An example may be fish in a trawl
net -- as the net gets full it becomes less and less efficacious. In this
case you may expect that a single unit of effort change will have different
effect when there has been lots of previous effort to when there hasn't.
Thanks for commenting on that, Scott!
Although both alternatives you mention above assume that the RV
depends on either the "offset" or the "sampling effort", but aren't
these are essentially the same?
If I thought that I was in the latter case, I may fit a model like
log( E( count)) = log( effort) + f(effort) + other stuff.
The function f(effort) can take any form, including beta*log(effort). In
such a case a test of beta==0 is equivalent to testing if the effect of
effort is purely scaling or if it is something else/sinister. General forms
of f(effort) may tell you much more but may also be much more confusing.
To choose between the two cases above (offset versus offset+covariate), I
would base my choice largely on prior knowledge of the system under study.
This is especially so if I don't have much data.
My approach to modeling counts was primarily based on the widespread
advise that varying effort should be considered by adding an offset to
the model, but when I consulted the book by McCullagh and Nelder
(1989), I found on pp. 206-207 hat they actually estimated the
log(effort) term as being ~ 1. So started my confusion on the topic
"to offset or to estimate" ;-)
It never occurred to me, though, that the effort could be entered both
as an offset *and* as a covariate into the model. As these two terms
have good chances being collinear, I wonder how one can then separate
their influence on the RV. I do not fully understand your idea
regarding the form of the function "f(effort) ", but I get that if the
coefficient of effort is estimated as == 0, then it should be
concluded that effect of effort should be retained *only* as an offset
to account for the "scaling". Am I right?
Thanks again for your elucidating comment,
Ivailo
--
UBUNTU: a person is a person through other persons.