Greetings all, I'm using logistic regression to investigate mortality of trees. I'm using AIC to compare models, and I'm wondering if I should use AICc instead of AIC. Burnham and Anderson [1] recommend using AICc when n/K < 40. But what do I consider for n? The logistic regression is based on 2811 observations (334 trees observed annually for <= 10 yr), but I've only observed 32 deaths. Harrell [2] would consider 32 to be the "limiting sample size" for determining the feasible number of predictor variables. Is AIC the same? Should I use 2811, 334, or 32 to figure out AICc? Thanks for any help. Sincerely, Matt [1] Burnham K. and D. Anderson. 2002. Model selection and multi-model inference: a practical information-theoretic approach. Springer. [2] Harrell, F. 2001. Regression modeling strategies. Springer.
Corrected AIC for binary response variables?
4 messages · Matthew Landis, Ben Bolker, Mark_Wotawa at nps.gov +1 more
Landis, R Matthew wrote:
Greetings all, I'm using logistic regression to investigate mortality of trees. I'm using AIC to compare models, and I'm wondering if I should use AICc instead of AIC. Burnham and Anderson [1] recommend using AICc when n/K < 40. But what do I consider for n? The logistic regression is based on 2811 observations (334 trees observed annually for <= 10 yr), but I've only observed 32 deaths. Harrell [2] would consider 32 to be the "limiting sample size" for determining the feasible number of predictor variables. Is AIC the same? Should I use 2811, 334, or 32 to figure out AICc?
Great question. I think I would probably go with 32, but I'd like to hear other opinions. Ben Bolker
Good question, Burnham and Anderson 2002 (p332) mention but don't elaborate on the issue. Intuitively it seems it would -not- be 2811 as you'd have something along the lines of pseudo-replication. B&A also mention in a capture-recapture context that 334 could be n for survival and 2811 could be n for recapture. So, applied to your situation, you're recapture rate is 1, which leads to my (again intuitive) guess that 334 would be appropriate. This gives you only about 8 parameters before you should be using the AICc according to the n/K < 40 rule, so I'd use AICc regardless. If you come up with something concrete on this, it would be great to know more. Regards, Mark *************************************************************** Mark A. Wotawa Quantitative Ecologist National Park Service Biological Resources Management Division 1201 Oak Ridge Drive, Suite 200 Fort Collins, CO 80525-5589 Office: 970-225-3567 FAX: 970-225-3585 Email: mark_wotawa at nps.gov *************************************************************** Greetings all, I'm using logistic regression to investigate mortality of trees. I'm using AIC to compare models, and I'm wondering if I should use AICc instead of AIC. Burnham and Anderson [1] recommend using AICc when n/K < 40. But what do I consider for n? The logistic regression is based on 2811 observations (334 trees observed annually for <= 10 yr), but I've only observed 32 deaths. Harrell [2] would consider 32 to be the "limiting sample size" for determining the feasible number of predictor variables. Is AIC the same? Should I use 2811, 334, or 32 to figure out AICc? Thanks for any help. Sincerely, Matt [1] Burnham K. and D. Anderson. 2002. Model selection and multi-model inference: a practical information-theoretic approach. Springer. [2] Harrell, F. 2001. Regression modeling strategies. Springer. _______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
I'm using logistic regression to investigate mortality of trees. I'm using AIC to compare models, and I'm wondering if I should use AICc instead of AIC. Burnham and Anderson [1] recommend using AICc when n/K < 40.
B&A 2002 is not as forceful on this as they are in person or in later work. Basically, it's a non-issue. AICc approaches AIC with large sample sizes (relative to the number of parameters), so just calculate AICc. Always.
But what do I consider for n?
I agree with Ben that this is a great question, and I doubt there is a single right answer. The capture-recapture world has converged on some general agreement in many of those models, but disagreement still exists and may never be resolved. The lazy, conservative approach in this case would be to take the value amongst the sensible ones that generates the smallest N:K ratio. Or, calculate the model selection stats for the range of the sensible numbers and see if it changes inference. It might not, and then you have nothing to worry about.