Thank you very much for clarifying this point. My algorithm is certainly pretty bad because as you say I am basically looking at zeros. One point I don't really understand is that for a pollen type I have a lot of pollen collected at date 1, some at time 2, few at time 3 and not at all at time 4. I get a significant difference between time 1 and 2 but no significance between 1 and 3 or 1 and 4. That is illogical...maybe is it anyway a problem of the residuals because the residuals are pretty well balanced for time points with fitted values >0, but for time points with no pollen collected there is no variance at all. Well I think that if I had a very large number of data such that the non-zero part of my data would look nicely continuous I could use some zero-inflated models, but with only 4 points in time and a positive part of the model which does not fit well a continuous distribution it is difficult. I'd certainly better take a descriptive way of presenting my data for sparse pollen types. Best wishes Val?rie
Message du 04/02/13 ? 13h15 De : "Liz Pryde" A : "v_coudrain at voila.fr" Copie ? : Objet : Re: [R-sig-eco] proportion data with many zeros Hi, If you're using a categorical predictor those QQ plots Etc are pretty useless. Just do a residuals vs fits plots and make sure the residuals look Randomly
scattered.
Is the problem with the smaller pollen types just that they're very low across all time scales? The algorithm won't fit b/c you're basically looking at zero data - or
a vector of zeroes. So you can assume that this is sig diff from the abundant types. This is to do with the way ML estimation works - it's a bit complicated.
Some people suggest using bayes methods for this (& it works well) but its way too over-complicated for what you're trying to answer. The mean variance relationship is specified by the 'family' part if the GLM formula. It is essentially the error structure if your data. Liz On 04/02/2013, at 7:55 PM, v_coudrain at voila.fr wrote:
I tried to use tweedie and it again worked very well for the most abundant pollen types and when trying to fit the less abundant ones I got the error: "glm.fit: algorithm did not converge". I have the impress that it is hopeless to try fitting a model...But anyway thank you very much for making me aware of tweedie. I still should go a bit more into
the
theorical background. I just wonder about the residuals. For the pollen types that can be modelled, the QQ-plots don't look very nice, but the residuals are
relatively
well homogeneously distributed. It is difficult to judge how good the fit is, but the results make sense in regard to the raw data. Val?rie
___________________________________________________________ CAN 2013 : r?sultats et matchs en direct ? suivre sur Voila.fr http://sports.voila.fr/football/can/
___________________________________________________________ CAN 2013 : r?sultats et matchs en direct ? suivre sur Voila.fr http://sports.voila.fr/football/can/