hurdle model

I'd like to try the same way to my dataset, hurdle but estimated by
'quasi-likelihood', but it's not in the standard 'pscl' package I
think, right?
Please keep discussion on list; just because I replied doesn't give you
a direct line to my inbox...

Why would you want a quasi-likelihood when you could have the real
thing? Seriously, if there is no likelihood you can't do likelihood
ratio tests, compare models using AIC/BIC etc.

Just use hurdle() if it fits the form of model you are after. don't
worry about likelihoods, quasi or otherwise. Check you are happy with
the range of models you could fit with hurdle() and use it. If you
aren't happy then you'd need to look elsewhere, but don't get hung up on
the quasi-likelihood bit.

My Tuppence,

G

On 19 Aug 2010, at 10:17, Gavin Simpson wrote:

On Thu, 2010-08-19 at 09:52 +0200, Yingjie Zhang wrote:
Hello everyone,

Does anyone of you using hurdle model? I am reading a paper which said
" Hurdle model removes effect of zero-inflation and over-dispersion in
the non-zero observations using a quasi-likelihood", I've checked the
help file from hurdle in R, which said differently that"for non-zero
obs normally a truncated poisson/NB is used" ... just want to make
sure, does it really estimated the parameter by "quasi-likelihood"

Thanks,
Yingjie Zhang
Biostatistician
The authors of that paper might have fitted their hurdle model using a
quasi likelihood but that is not, AFAICT, what is used in the hurdle()
function in package 'pscl', which maximises a proper log likelihood.

But hard to say from what you have provided.

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Hi,

There is a reason why am I addict to Quasi likelihood, since Hurdle  from 'pscl' use Zero Truncated Poisson regression for the non-zero part, which incapable of handling the over-disperson comes from the positive part of the data. Apparently, Quasi likelihood is at least a better choice. I've noticed the hurdle they used for the paper comes from package 'stats' instead of 'pscl', I didn't find this version of hurdle in r...

On Thu, 2010-08-19 at 10:30 +0200, Yingjie Zhang wrote:
I'd like to try the same way to my dataset, hurdle but estimated by
'quasi-likelihood', but it's not in the standard 'pscl' package I
think, right?
Please keep discussion on list; just because I replied doesn't give you
a direct line to my inbox...

Why would you want a quasi-likelihood when you could have the real
thing? Seriously, if there is no likelihood you can't do likelihood
ratio tests, compare models using AIC/BIC etc.

Just use hurdle() if it fits the form of model you are after. don't
worry about likelihoods, quasi or otherwise. Check you are happy with
the range of models you could fit with hurdle() and use it. If you
aren't happy then you'd need to look elsewhere, but don't get hung up on
the quasi-likelihood bit.

My Tuppence,

G

On 19 Aug 2010, at 10:17, Gavin Simpson wrote:

On Thu, 2010-08-19 at 09:52 +0200, Yingjie Zhang wrote:
Hello everyone,

Does anyone of you using hurdle model? I am reading a paper which said
" Hurdle model removes effect of zero-inflation and over-dispersion in
the non-zero observations using a quasi-likelihood", I've checked the
help file from hurdle in R, which said differently that"for non-zero
obs normally a truncated poisson/NB is used" ... just want to make
sure, does it really estimated the parameter by "quasi-likelihood"

Thanks,
Yingjie Zhang
Biostatistician
The authors of that paper might have fitted their hurdle model using a
quasi likelihood but that is not, AFAICT, what is used in the hurdle()
function in package 'pscl', which maximises a proper log likelihood.

But hard to say from what you have provided.

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Hi,

There is a reason why am I addict to Quasi likelihood, since Hurdle
from 'pscl' use Zero Truncated Poisson regression for the non-zero
part, which incapable of handling the over-disperson comes from the
positive part of the data. Apparently, Quasi likelihood is at least a
better choice. I've noticed the hurdle they used for the paper comes
from package 'stats' instead of 'pscl', I didn't find this version of
hurdle in r...
Quasi-likelihood isn't solving the "over-dispersion comes from positive
part". It is a means of fitting models, just like maximum likelihood
etc. It will be the authors model that does the accounting for over
dispersion. They solve the parameters of this model using
quasi-likelihood.

Your claim about hurdle in stats is incorrect:
getAnywhere(hurdle)
no object named ?hurdle? was found
getAnywhere("hurdle")
no object named ?hurdle? was found

So they must be using something else. Here's a thought; why not give us
the reference/citation for the paper you are reading --- it is difficult
to speculate further without more details like the actual paper?

Hurdle models fit a point mass at zero, whilst the count part of the
model is truncated to not allow any further zeros be produced from it.

A zeroinflated (zeroinfl() in pscl) model fits a point mass at zero and
has an untruncated count model which will allow extra zeros be produced.

In both cases a negative binomial model may be fitted to the count part,
which may be sufficient to cope with remaining overdispersion in the
count part of your model.

I think you would be better off thinking where the overdispersion is
coming from and choosing an appropriate means to model it. You are being
blinded by this talk of quasi-likelihoods. There may well be a way of
fitting the model you want in R without resorting to quasi-likelihood
tricks. But as you haven't told us what model you want to fit or a
citation for the paper you want to replicate, there isn't much further
we can do.

HTH

G
On 19 Aug 2010, at 10:55, Gavin Simpson wrote:

On Thu, 2010-08-19 at 10:30 +0200, Yingjie Zhang wrote:
I'd like to try the same way to my dataset, hurdle but estimated by
'quasi-likelihood', but it's not in the standard 'pscl' package I
think, right?
Please keep discussion on list; just because I replied doesn't give you
a direct line to my inbox...

Why would you want a quasi-likelihood when you could have the real
thing? Seriously, if there is no likelihood you can't do likelihood
ratio tests, compare models using AIC/BIC etc.

Just use hurdle() if it fits the form of model you are after. don't
worry about likelihoods, quasi or otherwise. Check you are happy with
the range of models you could fit with hurdle() and use it. If you
aren't happy then you'd need to look elsewhere, but don't get hung up on
the quasi-likelihood bit.

My Tuppence,

G

On 19 Aug 2010, at 10:17, Gavin Simpson wrote:

On Thu, 2010-08-19 at 09:52 +0200, Yingjie Zhang wrote:
Hello everyone,

Does anyone of you using hurdle model? I am reading a paper which said
" Hurdle model removes effect of zero-inflation and over-dispersion in
the non-zero observations using a quasi-likelihood", I've checked the
help file from hurdle in R, which said differently that"for non-zero
obs normally a truncated poisson/NB is used" ... just want to make
sure, does it really estimated the parameter by "quasi-likelihood"

Thanks,
Yingjie Zhang
Biostatistician
The authors of that paper might have fitted their hurdle model using a
quasi likelihood but that is not, AFAICT, what is used in the hurdle()
function in package 'pscl', which maximises a proper log likelihood.

But hard to say from what you have provided.

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Thanks for the details, the paper is 'Comparing species abundance models' by Joanne M.Potts, Jane Elith.  Click the link... on page 158, in the table, they compare 5 models, both Quasi-likelihood and Hurdle are mentioned.

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VBS-4KD5C2N-1&_user=794998&_coverDate=11%2F16%2F2006&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1435498227&_rerunOrigin=google&_acct=C000043466&_version=1&_urlVersion=0&_userid=794998&md5=fc0c4ebc77917948c90f8f0ee3bbe141

Maybe we went too far, when I read the paper above, I just thought it would be interesting to try the method they mentioned. My data have both characteristics: excess 0s and over dispersion of  positive part. And I am quite  convinced that the 0s have a single source ... that's why I didn't use ZIP/ZINB. 

Maybe for the excess 0s, over-dispersion and one source of 0s, the best model is Hurdle with truncated negative binomial, but my motive is to make sure that which ML method that Hurdle use.

Cheers,
Yingjie

On Thu, 2010-08-19 at 11:14 +0200, Yingjie Zhang wrote:
Hi,

There is a reason why am I addict to Quasi likelihood, since Hurdle
from 'pscl' use Zero Truncated Poisson regression for the non-zero
part, which incapable of handling the over-disperson comes from the
positive part of the data. Apparently, Quasi likelihood is at least a
better choice. I've noticed the hurdle they used for the paper comes
from package 'stats' instead of 'pscl', I didn't find this version of
hurdle in r...
Quasi-likelihood isn't solving the "over-dispersion comes from positive
part". It is a means of fitting models, just like maximum likelihood
etc. It will be the authors model that does the accounting for over
dispersion. They solve the parameters of this model using
quasi-likelihood.

Your claim about hurdle in stats is incorrect:

getAnywhere(hurdle)
no object named ?hurdle? was found
getAnywhere("hurdle")
no object named ?hurdle? was found

So they must be using something else. Here's a thought; why not give us
the reference/citation for the paper you are reading --- it is difficult
to speculate further without more details like the actual paper?

Hurdle models fit a point mass at zero, whilst the count part of the
model is truncated to not allow any further zeros be produced from it.

A zeroinflated (zeroinfl() in pscl) model fits a point mass at zero and
has an untruncated count model which will allow extra zeros be produced.

In both cases a negative binomial model may be fitted to the count part,
which may be sufficient to cope with remaining overdispersion in the
count part of your model.

I think you would be better off thinking where the overdispersion is
coming from and choosing an appropriate means to model it. You are being
blinded by this talk of quasi-likelihoods. There may well be a way of
fitting the model you want in R without resorting to quasi-likelihood
tricks. But as you haven't told us what model you want to fit or a
citation for the paper you want to replicate, there isn't much further
we can do.

HTH

G

On 19 Aug 2010, at 10:55, Gavin Simpson wrote:

On Thu, 2010-08-19 at 10:30 +0200, Yingjie Zhang wrote:
I'd like to try the same way to my dataset, hurdle but estimated by
'quasi-likelihood', but it's not in the standard 'pscl' package I
think, right?
Please keep discussion on list; just because I replied doesn't give you
a direct line to my inbox...

Why would you want a quasi-likelihood when you could have the real
thing? Seriously, if there is no likelihood you can't do likelihood
ratio tests, compare models using AIC/BIC etc.

Just use hurdle() if it fits the form of model you are after. don't
worry about likelihoods, quasi or otherwise. Check you are happy with
the range of models you could fit with hurdle() and use it. If you
aren't happy then you'd need to look elsewhere, but don't get hung up on
the quasi-likelihood bit.

My Tuppence,

G

On 19 Aug 2010, at 10:17, Gavin Simpson wrote:

On Thu, 2010-08-19 at 09:52 +0200, Yingjie Zhang wrote:
Hello everyone,

Does anyone of you using hurdle model? I am reading a paper which said
" Hurdle model removes effect of zero-inflation and over-dispersion in
the non-zero observations using a quasi-likelihood", I've checked the
help file from hurdle in R, which said differently that"for non-zero
obs normally a truncated poisson/NB is used" ... just want to make
sure, does it really estimated the parameter by "quasi-likelihood"

Thanks,
Yingjie Zhang
Biostatistician
The authors of that paper might have fitted their hurdle model using a
quasi likelihood but that is not, AFAICT, what is used in the hurdle()
function in package 'pscl', which maximises a proper log likelihood.

But hard to say from what you have provided.

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Thanks for the details, the paper is 'Comparing species abundance
models' by Joanne M.Potts, Jane Elith.  Click the link... on page 158,
in the table, they compare 5 models, both Quasi-likelihood and Hurdle
are mentioned.

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VBS-4KD5C2N-1&_user=794998&_coverDate=11%2F16%2F2006&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1435498227&_rerunOrigin=google&_acct=C000043466&_version=1&_urlVersion=0&_userid=794998&md5=fc0c4ebc77917948c90f8f0ee3bbe141

Maybe we went too far, when I read the paper above, I just thought it
would be interesting to try the method they mentioned. My data have
both characteristics: excess 0s and over dispersion of  positive part.
And I am quite  convinced that the 0s have a single source ... that's
why I didn't use ZIP/ZINB. 

Maybe for the excess 0s, over-dispersion and one source of 0s, the
best model is Hurdle with truncated negative binomial, but my motive
is to make sure that which ML method that Hurdle use.
They fit several models and compare them:

     I. Poisson
    II. Negative Binomial
   III. Quasi-likelihood
    IV. Hurdle model
     V. zero-inflated model

III should be a quasi-poisson model, i.e. you fit the Poisson GLM using
quasi-likelihood and model the dispersion parameter \phi alongside the
usual Poisson GLM parameters.

Section 2.3 of their paper on the hurdle model doesn't even mention
"quasi". Though they do mention this in Table2.

Reading this, I think they cooked this model themselves - you can fit a
binomial model yourself for the presence absence and then fit a count
model for the samples predicted to be present from the binomial part. To
make things simple I suspect they fitted the count part as quasi-Poisson
but no-where does it say exactly what they did.

You would be better off fitting the hurdle as I mentioned using hurdle()
in pscl; fitting things using quasi-likelihood is just asking for
trouble if there are proper likelihood options available.

Read the vignette that accompanies the pscl package for details of how
it fits the various models including the hurdle. This includes the
likelihood functions that are optimised as part of the fitting.

HTH

G
Cheers,
Yingjie  

On 19 Aug 2010, at 11:49, Gavin Simpson wrote:

On Thu, 2010-08-19 at 11:14 +0200, Yingjie Zhang wrote:
Hi,

There is a reason why am I addict to Quasi likelihood, since Hurdle
from 'pscl' use Zero Truncated Poisson regression for the non-zero
part, which incapable of handling the over-disperson comes from the
positive part of the data. Apparently, Quasi likelihood is at least a
better choice. I've noticed the hurdle they used for the paper comes
from package 'stats' instead of 'pscl', I didn't find this version of
hurdle in r...
Quasi-likelihood isn't solving the "over-dispersion comes from positive
part". It is a means of fitting models, just like maximum likelihood
etc. It will be the authors model that does the accounting for over
dispersion. They solve the parameters of this model using
quasi-likelihood.

Your claim about hurdle in stats is incorrect:

getAnywhere(hurdle)
no object named ?hurdle? was found
getAnywhere("hurdle")
no object named ?hurdle? was found

So they must be using something else. Here's a thought; why not give us
the reference/citation for the paper you are reading --- it is difficult
to speculate further without more details like the actual paper?

Hurdle models fit a point mass at zero, whilst the count part of the
model is truncated to not allow any further zeros be produced from it.

A zeroinflated (zeroinfl() in pscl) model fits a point mass at zero and
has an untruncated count model which will allow extra zeros be produced.

In both cases a negative binomial model may be fitted to the count part,
which may be sufficient to cope with remaining overdispersion in the
count part of your model.

I think you would be better off thinking where the overdispersion is
coming from and choosing an appropriate means to model it. You are being
blinded by this talk of quasi-likelihoods. There may well be a way of
fitting the model you want in R without resorting to quasi-likelihood
tricks. But as you haven't told us what model you want to fit or a
citation for the paper you want to replicate, there isn't much further
we can do.

HTH

G

On 19 Aug 2010, at 10:55, Gavin Simpson wrote:

On Thu, 2010-08-19 at 10:30 +0200, Yingjie Zhang wrote:
I'd like to try the same way to my dataset, hurdle but estimated by
'quasi-likelihood', but it's not in the standard 'pscl' package I
think, right?
Please keep discussion on list; just because I replied doesn't give you
a direct line to my inbox...

Why would you want a quasi-likelihood when you could have the real
thing? Seriously, if there is no likelihood you can't do likelihood
ratio tests, compare models using AIC/BIC etc.

Just use hurdle() if it fits the form of model you are after. don't
worry about likelihoods, quasi or otherwise. Check you are happy with
the range of models you could fit with hurdle() and use it. If you
aren't happy then you'd need to look elsewhere, but don't get hung up on
the quasi-likelihood bit.

My Tuppence,

G

On 19 Aug 2010, at 10:17, Gavin Simpson wrote:

On Thu, 2010-08-19 at 09:52 +0200, Yingjie Zhang wrote:
Hello everyone,

Does anyone of you using hurdle model? I am reading a paper which said
" Hurdle model removes effect of zero-inflation and over-dispersion in
the non-zero observations using a quasi-likelihood", I've checked the
help file from hurdle in R, which said differently that"for non-zero
obs normally a truncated poisson/NB is used" ... just want to make
sure, does it really estimated the parameter by "quasi-likelihood"

Thanks,
Yingjie Zhang
Biostatistician
The authors of that paper might have fitted their hurdle model using a
quasi likelihood but that is not, AFAICT, what is used in the hurdle()
function in package 'pscl', which maximises a proper log likelihood.

But hard to say from what you have provided.

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote:
They fit several models and compare them:

     I. Poisson
    II. Negative Binomial
   III. Quasi-likelihood
    IV. Hurdle model
     V. zero-inflated model

III should be a quasi-poisson model, i.e. you fit the Poisson GLM
using
quasi-likelihood and model the dispersion parameter \phi alongside the
usual Poisson GLM parameters.

Section 2.3 of their paper on the hurdle model doesn't even mention
"quasi". Though they do mention this in Table2.

Reading this, I think they cooked this model themselves - you can fit
a
binomial model yourself for the presence absence and then fit a count
model for the samples predicted to be present from the binomial part.
To
make things simple I suspect they fitted the count part as
quasi-Poisson
but no-where does it say exactly what they did.
I know that at least Jane Elith has an email address (I have used it
years ago), so you could ask her. However, it may be  that their hurdle
model uses just Poisson, and there is a minor mistake in their Table 2. 

You can use quasipoisson() or poisson() in glm() in a very natural way:
the fitting happens via iteratively reweighted least squares, and all
you need to define is the relationship between fitted values and
variance. If you look at poisson() and quasipoisson() functions in R
(these provide the backbone of the glm(..., family=)), you see that the
differences are that quasipoissoin()$aic() always returns NA, and
quasipoisson() lacks item simulate(). Otherwise they work in a similar
way. Except in poisson() you take the scale (\phi) to be 1, and in
quasipoisson() you estimate the scale from the fitted model. Then you
just multiply standard errors with the scale, use F tests instead of
Chisq in anova() etc.

I am not sure (or actually, I don't think) that this fitting parallelism
extends to *truncated* Poisson that is used in pscl::hurdle(). Although
you can do fitting by stages, and fit quasipoisson() glm for above-zero
values, I don't think this is the correct thing to do when you are not
allowed to have new zeros. However, the truncated poisson likelihood
model is a huge improvement over hand-fitting glm with iteratively
reweighted least squares and assuming constant variance/fit
relationship.

If you are worried about the overdispersion of the above-zero count
data, use the truncated negative binomial model offerred by
pscl::hurdle(). It is designed for the purpose (and has a more exciting
narrative for ecologists).

Cheers, Jari Oksanen
Dear All,

I had a quick look at the internal functions used by pscl::hurdle to
do the numerical optimization by optim. It clearly corresponds to the
hurdle model defined in the paper/vignette, where the zero component
is based on a right censored random variable, that is 0 if the
original count data is 0 and 1 otherwise. The likelihood function for
the zero model corresponds to a censored Poisson model. The count
estimation part is based on left truncated Poisson. This is a
conditional inference thinking, the zero model tells us what
determines if the data is 0 or >0, and once the observations are >0,
than what determines the exact count. If estimates from the 2 models
are identical, it means that 0s can arise from the same Poisson
distribution as the counts. So this is not really a mixture as it is
the case with the zeroinfl() model. The resulting log-likelihood is
still valid, and table 1 clearly states that the hurdle model is based
on ML (maximum likelihood).

It is not the same estimating procedure as for the quasipoisson, where
a likelihood-like function is used to get estimates (note that
parameter estimates are the same as for Poisson, but SEs and the
dispaersion parameter are different).

To handle other overdispersion than zero inflation, one can choose NB
instead of Poisson in hurdle. The quasipoisson family is not allowed
there.

Cheers,

Peter

P?ter S?lymos
Alberta Biodiversity Monitoring Institute
and Boreal Avian Modelling project
Department of Biological Sciences
CW 405, Biological Sciences Bldg
University of Alberta
Edmonton, Alberta, T6G 2E9, Canada
Phone: 780.492.8534
Fax: 780.492.7635
email <- paste("solymos", "ualberta.ca", sep = "@")
http://www.abmi.ca
http://sites.google.com/site/psolymos
On Thu, 2010-08-19 at 14:54 +0300, Gavin Simpson wrote:
On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote:

They fit several models and compare them:

? ? ?I. Poisson
? ? II. Negative Binomial
? ?III. Quasi-likelihood
? ? IV. Hurdle model
? ? ?V. zero-inflated model

III should be a quasi-poisson model, i.e. you fit the Poisson GLM
using
quasi-likelihood and model the dispersion parameter \phi alongside the
usual Poisson GLM parameters.

Section 2.3 of their paper on the hurdle model doesn't even mention
"quasi". Though they do mention this in Table2.

Reading this, I think they cooked this model themselves - you can fit
a
binomial model yourself for the presence absence and then fit a count
model for the samples predicted to be present from the binomial part.
To
make things simple I suspect they fitted the count part as
quasi-Poisson
but no-where does it say exactly what they did.
I know that at least Jane Elith has an email address (I have used it
years ago), so you could ask her. However, it may be ?that their hurdle
model uses just Poisson, and there is a minor mistake in their Table 2.

You can use quasipoisson() or poisson() in glm() in a very natural way:
the fitting happens via iteratively reweighted least squares, and all
you need to define is the relationship between fitted values and
variance. If you look at poisson() and quasipoisson() functions in R
(these provide the backbone of the glm(..., family=)), you see that the
differences are that quasipoissoin()$aic() always returns NA, and
quasipoisson() lacks item simulate(). Otherwise they work in a similar
way. Except in poisson() you take the scale (\phi) to be 1, and in
quasipoisson() you estimate the scale from the fitted model. Then you
just multiply standard errors with the scale, use F tests instead of
Chisq in anova() etc.

I am not sure (or actually, I don't think) that this fitting parallelism
extends to *truncated* Poisson that is used in pscl::hurdle(). Although
you can do fitting by stages, and fit quasipoisson() glm for above-zero
values, I don't think this is the correct thing to do when you are not
allowed to have new zeros. However, the truncated poisson likelihood
model is a huge improvement over hand-fitting glm with iteratively
reweighted least squares and assuming constant variance/fit
relationship.

If you are worried about the overdispersion of the above-zero count
data, use the truncated negative binomial model offerred by
pscl::hurdle(). It is designed for the purpose (and has a more exciting
narrative for ecologists).

Cheers, Jari Oksanen

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100819/8c0eeb08/attachment.pl>
Dear all,

Thanks for all your perspectives, I agree with Dr. Gavin Simpson's opinion that the author cooked the model by themselves, and there is no Hurdle function in  package 'stats'.

I got a data set of the abundance of microbial community, I think some of you will know how it looks like, it's from 1000 different marine microbial species and selected from 45 locations, for each species, it comes up with many 0s, besides, the above zero-part is quite spars...and with some extreme values like 2000, and all the rest are abounded around some certain small positive value, for instance, 10.  I don't think the extreme values are outliers, because it says somethings, and for further considerations, data-transformation is not my choice either. Any suggestion from the models to fit this kind of data set? or what's the classical model for analyzing microbial abundance?

Thanks and Regards,
Yingjie

I don't know what the "classical" model is.  I would be tempted to fit
this as a three-category ordinal model (i.e. estimating the probability
of (0, 1-20, >20) [putting the breakpoint wherever it seems to make
sense].  It doesn't make much sense to me to try to fit numbers in the
range 1-20 and numbers in the range of thousands together into the same
distribution -- presumably (??) three different processes are occurring.
(How informative do you think the range of variation in the middle
(1-20) category is?  In the extreme (100-) category?) Having 1000
species adds another complexity --it is less than optimal to fit 1000
separate responses. Random effects model, possibly with phylogenetic or
functional groupings included?  Unfortunately, putting these pieces
together -- ordinal models, random effects, large data sets -- makes the
problem challenging.  I would check out the ordinal and MCMCglmm packages.