Thanks Liz, Brian and Mollie. Your replies are helpful for me.
I have realized the complexity of analysing the count data. Different methods should be selected according to the structure of my data. But if the explanatory data table is composed by several variables and I want to make clear the relative contribution of each explanatory variable to the variation of response variable, can I achieve it based on the Poisson, negative bionmial, zero-inflated Poisson, zero-inflated negative binomial or other models (except the simple linear regression model)?
Thanks for your attention and best wishes for you!
Lin
Aug 1st, 2012
At 2012-08-02 06:17:03,"Liz Pryde" <elizabethpryde at gmail.com> wrote:
Hello Lin,
It's a bit difficult to give you an exact answer without knowing what the data set is.
That said, in general, count data does usually follow a poisson or neg binomial distribution. When you log transform such data, you are trying to normalise its distribution so that you can apply OLS regression (which assumes a normal distribution for the Y data to give you 'correct' significance values). One of the problems with this is that when you have data that can't be negative (e.g. counts) OLS regression on count data can give you estimations of your predicted values that may be negative (and hence, nonsensical). Also, when mean values are very low (near zero or zero), transforming the data becomes ineffective. GLMs overcome this problem because they explicitly model the X-Y relationship through the link function and ALSO model the mean-variance relationship (which is what is happening when you choose 'family= '). So they 'normalise' and 'linearise' the data as well as 'model' the error. Poisson data will generally have a variance that increases with increasing mean.!
On occasion this variance may be higher than expected (overdispersed) and so the negative binomial becomes appropriate.
I hope this helps.
The O'Hara paper that was mentioned below gives an excellent explanation of this.
Liz
On 02/08/2012, at 3:42 AM, lgj200306 <lgj200306 at 163.com> wrote:
Hi, everyone,
I used two methods to analysis the relationship between y (count data) and x.
1) log transformed simple linear regression:
lm(log(y+1)~x1+x2+x3,data)
glm(y~x1+x2+x3,family=poisson())
Some one told that these two ways are very similar, but other one told that the Poisson regression does not fit the y but the lambda (parameter of the poisson distribution). I am not sure which one is right. Can anybody help me?
Thanks for your attention and best wishes for you!
Lin
Aug 1st, 2012
[[alternative HTML version deleted]]