Very odd parameter estimates using GEE with AR-1 correlation structure
I can't comment much on GEE's but I believe u can use mixed models for population inference, if correctly specified and interpreted. I think when predicting though u need to use only the population level parameters and not the resp level ones. Others will know more about this than I and can likely comment or suggest relevant papers. Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement.
On 19/05/2012, at 4:22, Anne Bjorkman <annebj at gmail.com> wrote:
Hello mixed modelers, I am having problems with some GEE models I am trying to run using geepack. I have species abundance data for 52 different species in 154 sites over 47 years, and I am trying to extract slope parameter estimates so that I can look at whether these species have increased or decreased in abundance over time, while taking into account the repeated measurements at each site over time. I originally started doing this with mixed models, but have been advised that GEE would be more appropriate for my data as it gives population-averaged responses. However, when I try to run GEE's on my data I get really bizarre parameter estimates for some of my species. As my dataset is huge I unfortunately cannot provide the whole thing, but I have uploaded a subset of the data for one species with a particularly bizarre slope parameter estimate here: http://dl.dropbox.com/u/4481861/Example_for_GEE_one_species.csv The data look like this: Site Year Species Value_Pres Value_Abs 1 1 1961 1 0 2089 2 1 1962 1 0 2120 3 1 1963 1 0 2089 4 1 1964 1 0 2225 5 1 1965 1 0 2197 6 1 1966 1 0 2208 I have been using the following model specification (I have been running a loop to calculate estimates for all 52 species separately, but this is for just one species): speciesA<-orderBy(~Site+Year,data=speciesA) #using the doBy package to order by subject then time speciesA.mod<-geeglm(cbind(Value_Pres,Value_Abs)~I(Year-1961),data=speciesA, family=binomial,id=Site,corstr="ar1") Call: geeglm(formula = cbind(Value_Pres, Value_Abs) ~ I(Year - 1961), family = binomial, data = speciesA, id = Site, corstr = "ar1") Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) -2.99e+14 9.10e+11 107705 <2e-16 *** I(Year - 1961) -9.62e+13 3.88e+10 6155147 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Estimated Scale Parameters: Estimate Std.err (Intercept) 6.57e+10 5.62e+30 Correlation: Structure = ar1 Link = identity Estimated Correlation Parameters: Estimate Std.err alpha 0.98 4.4e+18 Number of clusters: 154 Maximum cluster size: 47 I suspect the problem might have something to do with the correlation structure, as species abundances in subsequent years are often very highly correlated, even if there is substantial change over the 47 years overall. If I use the corstr="independence" command I get parameter estimate that are very similar to those I got using mixed effects models (at least, the slopes for species responses relative to each other are similar). Furthermore, if I use corstr="ar1" but subset my data to every 5 years instead of every year, I get much more reasonable slope estimates for this particular species as well as most of the other species (slope values are very similar to the corstr="independence" value), but a few different species' slopes then get very weird. (By get weird I mean that they have abnormally large positive or negative slopes that don't reflect what's happening in the raw data at all). I would really appreciate some insight into what the problem with my data could be, or, more particularly, how to fix it! My head and my wall would be very grateful! Perhaps I should just give up on GEE's and go back to mixed models?? Thanks very much, Anne [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models