Skip to content

Corrected AIC for binary response variables?

4 messages · Matthew Landis, Ben Bolker, Mark_Wotawa at nps.gov +1 more

#
Greetings all,

I'm using logistic regression to investigate mortality of trees.  I'm using AIC to compare models, and I'm wondering if I should use AICc instead of  AIC.  Burnham and Anderson [1] recommend using AICc when n/K < 40.  But what do I consider for n?  The logistic regression is based on 2811 observations (334 trees observed annually for <= 10 yr), but I've only observed 32 deaths.  Harrell [2] would consider 32 to be the "limiting sample size" for determining the feasible number of predictor variables.  Is AIC the same?  Should I use 2811, 334, or 32 to figure out AICc?

Thanks for any help.

Sincerely,

Matt

[1] Burnham K. and D. Anderson.  2002.  Model selection and multi-model inference:  a practical information-theoretic approach.  Springer.
[2] Harrell, F. 2001.  Regression modeling strategies.  Springer.
#
Landis, R Matthew wrote:
Great question.  I think I would probably go with 32, but I'd like to 
hear other opinions.

  Ben Bolker
#
Good question,
Burnham and Anderson 2002 (p332) mention but don't elaborate on the issue.
Intuitively it seems  it would -not- be 2811 as you'd have something along
the lines of pseudo-replication.  B&A also mention in a capture-recapture
context that 334 could be n for survival and 2811 could be n for recapture.
So, applied to your situation, you're recapture rate is 1,  which leads to
my (again intuitive) guess that 334 would be appropriate.  This gives you
only about 8 parameters before you should be using the AICc according to
the   n/K < 40 rule, so I'd use AICc regardless.   If you come up with
something concrete on this, it would be great to know more.
Regards,
Mark


***************************************************************
Mark A. Wotawa
Quantitative Ecologist
National Park Service
Biological Resources Management Division
1201 Oak Ridge Drive, Suite 200
Fort Collins, CO  80525-5589
Office: 970-225-3567
FAX: 970-225-3585
Email: mark_wotawa at nps.gov
***************************************************************




Greetings all,

I'm using logistic regression to investigate mortality of trees.  I'm using
AIC to compare models, and I'm wondering if I should use AICc instead of
AIC.  Burnham and Anderson [1] recommend using AICc when n/K < 40.  But
what do I consider for n?  The logistic regression is based on 2811
observations (334 trees observed annually for <= 10 yr), but I've only
observed 32 deaths.  Harrell [2] would consider 32 to be the "limiting
sample size" for determining the feasible number of predictor variables.
Is AIC the same?  Should I use 2811, 334, or 32 to figure out AICc?

Thanks for any help.

Sincerely,

Matt

[1] Burnham K. and D. Anderson.  2002.  Model selection and multi-model
inference:  a practical information-theoretic approach.  Springer.
[2] Harrell, F. 2001.  Regression modeling strategies.  Springer.

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
#
B&A 2002 is not as forceful on this as they are in person or in later work.
Basically, it's a non-issue. AICc approaches AIC with large sample sizes
(relative to the number of parameters), so just calculate AICc. Always.
I agree with Ben that this is a great question, and I doubt there
is a single right answer. The capture-recapture world has converged
on some general agreement in many of those models, but disagreement
still exists and may never be resolved.

The lazy, conservative approach in this case would be to take the
value amongst the sensible ones that generates the smallest
N:K ratio. Or, calculate the model selection stats for the range of
the sensible numbers and see if it changes inference. It might not,
and then you have nothing to worry about.