Hi everyone, Although not strictly an R issue there often seems to be discussions along these lines on this list, so I hope no one minds me posting this. If U do please let me know. (and just for the record I am applying this in R) I'm trying to get my head around AIC and sample size. Now if AIC = -2ln(L) + 2K = Deviance + 2K Am I right in thinking that as the Likelihood is the product of probabilities then (all else being equal) the larger the sample size the smaller the Likelihood? Which means that if we have very large sample sizes we expect the -2ln(L) term to be a very large number? Which would reduce the effect of the parameter correction term 2K? Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au
Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size?
11 messages · Ben Bolker, Steve Taylor, Chris Howden +2 more
Hi, I may be wrong, but I understood that AIC in itself is not as important as changes in AIC between models, and some authors says that changes in AIC in the order of more than 10 are enough to favor a model on another. And changes in the 2*k term should be in this order of magnitude when comparing different models. So my guess would be that it remains important. On the other hand, if a set of parameters will remain in all models, it probably can be safely ignored in the 2*k term for all models. Hope this helps,
On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote:
? Hi everyone, ? ? Although not strictly an R issue there often seems to be discussions along ? these lines on this list, so I hope no one minds me posting this. If U do ? please let me know. (and just for the record I am applying this in R) ? ? I'm trying to get my head around AIC and sample size. ? ? Now if AIC = -2ln(L) + 2K = Deviance + 2K ? ? Am I right in thinking that as the Likelihood is the product of ? probabilities then (all else being equal) the larger the sample size the ? smaller the Likelihood? ? Which means that if we have very large sample sizes we expect the -2ln(L) ? term to be a very large number? ? Which would reduce the effect of the parameter correction term 2K? ? ? ? Chris Howden B.Sc. (Hons) GStat. ? Founding Partner ? Evidence Based Strategic Development, IP Commercialisation and Innovation, ? Data Analysis, Modelling and Training ? (mobile) 0410 689 945 ? (fax) +612 4782 9023 ? chris at trickysolutions.com.au ? ? _______________________________________________ ? R-sig-mixed-models at r-project.org mailing list ? https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html
Emmanuel Curis <emmanuel.curis at ...> writes:
Hi, I may be wrong, but I understood that AIC in itself is not as important as changes in AIC between models, and some authors says that changes in AIC in the order of more than 10 are enough to favor a model on another. And changes in the 2*k term should be in this order of magnitude when comparing different models. So my guess would be that it remains important.
You are exactly right. This is exactly equivalent to the initially surprising result that the maximum (log-)likelihood *decreases* when the sample size increases: the probability of any *particular* outcome goes down. Generally in likelihood-based statistical approaches (including AIC) we only look at the differences in (log-)likelihood/AIC, not the absolute number. I've started a campaign to try to get people _never_ to produce tables of raw AIC values; only the delta-AIC values should be presented (if necessary the minimum AIC value can be put in a footnote somewhere so people can check for reproducibility of the results, but that's the only reason one should ever care about the raw value). That's not to downplay the issues with AIC in the mixed model context: http://glmm.wikidot.com/faq#aic
2 days later
I agree that it is changes in AIC that matter, not its absolute value.
My understanding is that AIC is only useful for comparing two models fitted on the same data set, i.e. with the same sample size. So the question of how AIC changes with sample size is of little use beyond curiosity.
The change in AIC caused by adding a term to the model formula would be of interest. But the change in AIC caused by adding cases to the sample size is pretty meaningless.
The 2K part is important because it provides a penalty for the change in the number of parameters between a simpler model and a more complex model.
I would advise against making any approximations when calculating AIC, especially considering its main use is in taking the difference between two close large numbers.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel Curis
Sent: Friday, 1 March 2013 9:18p
To: Chris Howden
Cc: r-sig-mixed-models
Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size?
Hi,
I may be wrong, but I understood that AIC in itself is not as
important as changes in AIC between models, and some authors says that
changes in AIC in the order of more than 10 are enough to favor a
model on another.
And changes in the 2*k term should be in this order of magnitude when
comparing different models.
So my guess would be that it remains important.
On the other hand, if a set of parameters will remain in all models,
it probably can be safely ignored in the 2*k term for all models.
Hope this helps,
On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote:
< Hi everyone, < < Although not strictly an R issue there often seems to be discussions along < these lines on this list, so I hope no one minds me posting this. If U do < please let me know. (and just for the record I am applying this in R) < < I'm trying to get my head around AIC and sample size. < < Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking that as the Likelihood is the product of < probabilities then (all else being equal) the larger the sample size the < smaller the Likelihood? < Which means that if we have very large sample sizes we expect the -2ln(L) < term to be a very large number? < Which would reduce the effect of the parameter correction term 2K? < < < Chris Howden B.Sc. (Hons) GStat. < Founding Partner < Evidence Based Strategic Development, IP Commercialisation and Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689 945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < < _______________________________________________ < R-sig-mixed-models at r-project.org mailing list < https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks for the responses everyone,
I agree that its changes of 'goodness of fit' Likelihood functions such as
AIC and deviance that matter, not their absolute size.
However I think the impact of sample size may be something we need to
consider, particularly when analysing "Big Data" sets.
I recently did some analysis on "Big Data", the number of rows was over
300 000. What I found was that the Full Model was always selected using
AIC, Deviance and LRT. However when I had a look at the effects of the
predictors I found some of them were negligible, to the point of not
really being worth including in the model. Despite what the AIC and LRT
say.
This seems to be the same sample size issue faced with simple Univariate
tests such as ANOVA i.e. large sample sizes give so much power that
statistically significant results may be of no/little practical value.
The reason I asked about the convergence of deviance and AIC at large
sample sizes was thus.
The LRT tests between the Full model and 1 less predictor all had
exceptionally small p-values, which meant that the difference in Ln(L) was
very large. So large that it appears that the difference in deviance and
AIC was essentially the same.
So although it?s the difference that matters, if they converge at large
sample sizes than a large difference in deviance means there will also be
a large difference in AIC and they will come to the same conclusion??
However as they don't converge at small sample sizes this effect is not as
relevant.
Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling and Training
(mobile) 0410 689 945
(fax) +612 4782 9023
chris at trickysolutions.com.au
Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information.?If you are
not the named or intended recipient, please delete this communication and
contact us immediately.?Please note you are not authorised to copy, use or
disclose this communication or any attachments without our consent.
Although this email has been checked by anti-virus software, there is a
risk that email messages may be corrupted or infected by viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us. Information
provided by us should not be solely relied upon when making decisions and
clients should use their own judgement.
-----Original Message-----
From: Steve Taylor [mailto:steve.taylor at aut.ac.nz]
Sent: Monday, 4 March 2013 10:10 AM
To: Emmanuel Curis; Chris Howden
Cc: r-sig-mixed-models
Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
deviance, at very large sample size?
I agree that it is changes in AIC that matter, not its absolute value.
My understanding is that AIC is only useful for comparing two models
fitted on the same data set, i.e. with the same sample size. So the
question of how AIC changes with sample size is of little use beyond
curiosity.
The change in AIC caused by adding a term to the model formula would be of
interest. But the change in AIC caused by adding cases to the sample size
is pretty meaningless.
The 2K part is important because it provides a penalty for the change in
the number of parameters between a simpler model and a more complex model.
I would advise against making any approximations when calculating AIC,
especially considering its main use is in taking the difference between
two close large numbers.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel
Curis
Sent: Friday, 1 March 2013 9:18p
To: Chris Howden
Cc: r-sig-mixed-models
Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
deviance, at very large sample size?
Hi,
I may be wrong, but I understood that AIC in itself is not as important as
changes in AIC between models, and some authors says that changes in AIC
in the order of more than 10 are enough to favor a model on another.
And changes in the 2*k term should be in this order of magnitude when
comparing different models.
So my guess would be that it remains important.
On the other hand, if a set of parameters will remain in all models, it
probably can be safely ignored in the 2*k term for all models.
Hope this helps,
On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote:
< Hi everyone, < < Although not strictly an R issue there often seems to be discussions along < these lines on this list, so I hope no one minds me posting this. If U do < please let me know. (and just for the record I am applying this in R) < < I'm trying to get my head around AIC and sample size. < < Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking that as the Likelihood is the product of < probabilities then (all else being equal) the larger the sample size the < smaller the Likelihood? < Which means that if we have very large sample sizes we expect the -2ln(L) < term to be a very large number? < Which would reduce the effect of the parameter correction term 2K? < < < Chris Howden B.Sc. (Hons) GStat. < Founding Partner < Evidence Based Strategic Development, IP Commercialisation and Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689 945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < < _______________________________________________ < R-sig-mixed-models at r-project.org mailing list < https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- Emmanuel CURIS emmanuel.curis at parisdescartes.fr Page WWW: http://emmanuel.curis.online.fr/index.html _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130304/a7c2d370/attachment.pl>
On 13-03-03 07:04 PM, Chris Howden wrote:
Thanks for the responses everyone, I agree that its changes of 'goodness of fit' Likelihood functions such as AIC and deviance that matter, not their absolute size. However I think the impact of sample size may be something we need to consider, particularly when analysing "Big Data" sets. I recently did some analysis on "Big Data", the number of rows was over 300 000. What I found was that the Full Model was always selected using AIC, Deviance and LRT. However when I had a look at the effects of the predictors I found some of them were negligible, to the point of not really being worth including in the model. Despite what the AIC and LRT say.
Well, what do you mean by "not really worth including in the model"? The AIC is telling you that they improve the expected predictive accuracy. "Too small to be interesting" is certainly possible, but it's impossible for us to know (without the context of the question and without knowing what question you're trying to answer with the model) whether the effects are or aren't.
This seems to be the same sample size issue faced with simple Univariate tests such as ANOVA i.e. large sample sizes give so much power that statistically significant results may be of no/little practical value. The reason I asked about the convergence of deviance and AIC at large sample sizes was thus. The LRT tests between the Full model and 1 less predictor all had exceptionally small p-values, which meant that the difference in Ln(L) was very large.
(I would put this the other way around: deviance/log-likelihood difference is more fundamental than p-value.)
So large that it appears that the difference in deviance and AIC was essentially the same.
Yes, it's true that for a fixed range of model sizes, model complexity matters less and less for large samples.
So although it?s the difference that matters, if they converge at large sample sizes then a large difference in deviance means there will also be a large difference in AIC and they will come to the same conclusion?? However as they don't converge at small sample sizes this effect is not as relevant.
It's fairly well known, I think, that "everything is significant" for
sufficiently large sample sizes. Arguably (e.g. according to Andrew
Gelman) we should be using hierarchical models to include more and more
structure in our models, so that we are always extracting as much
information as is in the data ...
I'm not really clear on what your question is any more. (And, these
are really general stats/modelling questions, not so much mixed modeling
questions ...)
cheers
Ben Bolker
Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling and Training
(mobile) 0410 689 945
(fax) +612 4782 9023
chris at trickysolutions.com.au
Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are
not the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy, use or
disclose this communication or any attachments without our consent.
Although this email has been checked by anti-virus software, there is a
risk that email messages may be corrupted or infected by viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us. Information
provided by us should not be solely relied upon when making decisions and
clients should use their own judgement.
-----Original Message-----
From: Steve Taylor [mailto:steve.taylor at aut.ac.nz]
Sent: Monday, 4 March 2013 10:10 AM
To: Emmanuel Curis; Chris Howden
Cc: r-sig-mixed-models
Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
deviance, at very large sample size?
I agree that it is changes in AIC that matter, not its absolute value.
My understanding is that AIC is only useful for comparing two models
fitted on the same data set, i.e. with the same sample size. So the
question of how AIC changes with sample size is of little use beyond
curiosity.
The change in AIC caused by adding a term to the model formula would be of
interest. But the change in AIC caused by adding cases to the sample size
is pretty meaningless.
The 2K part is important because it provides a penalty for the change in
the number of parameters between a simpler model and a more complex model.
I would advise against making any approximations when calculating AIC,
especially considering its main use is in taking the difference between
two close large numbers.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel
Curis
Sent: Friday, 1 March 2013 9:18p
To: Chris Howden
Cc: r-sig-mixed-models
Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
deviance, at very large sample size?
Hi,
I may be wrong, but I understood that AIC in itself is not as important as
changes in AIC between models, and some authors says that changes in AIC
in the order of more than 10 are enough to favor a model on another.
And changes in the 2*k term should be in this order of magnitude when
comparing different models.
So my guess would be that it remains important.
On the other hand, if a set of parameters will remain in all models, it
probably can be safely ignored in the 2*k term for all models.
Hope this helps,
On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote:
< Hi everyone,
<
< Although not strictly an R issue there often seems to be discussions
along < these lines on this list, so I hope no one minds me posting this.
If U do < please let me know. (and just for the record I am applying this
in R) < < I'm trying to get my head around AIC and sample size.
<
< Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking
that as the Likelihood is the product of < probabilities then (all else
being equal) the larger the sample size the < smaller the Likelihood?
< Which means that if we have very large sample sizes we expect the
-2ln(L) < term to be a very large number?
< Which would reduce the effect of the parameter correction term 2K?
<
<
< Chris Howden B.Sc. (Hons) GStat.
< Founding Partner
< Evidence Based Strategic Development, IP Commercialisation and
Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689
945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < <
_______________________________________________ < R-sig-mixed-models at r-project.org mailing list < https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- Emmanuel CURIS emmanuel.curis at parisdescartes.fr Page WWW: http://emmanuel.curis.online.fr/index.html _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks for the reply Ben, In response to your specific points
Well, what do you mean by "not really worth including in the model"?
1) In terms of "too small to be interesting" I mean an Odds Ratio of 1.02 (from a GLMM with binomial family and random intercept based on the individuals, each individual having 100's of data points)
It's fairly well known, I think, that "everything is significant" for
sufficiently large sample sizes. Arguably (e.g. according to Andrew
Gelman) we should be using hierarchical models to include more and more
structure in our models, so that we are always extracting as much information as is >> in the data ... I wish it was well known, but I'm not so sure. One of the reasons I'm asking this is that we are trying to publish our results and the reviewers keep saying things like "If U had used AIC U wouldn't have this problem, so go back and reanalyse it using AIC". We have stated that we have a "ubiquitous significance" problem partially due to sample size and have suggested a way around it that involves interpreting the effect sizes at different spatial scales in order to find those that are useful. But I'm having a hard time convincing people, so I'm trying to put together a mathematical reason, which leads into your next point.
I'm not really clear on what your question is any more. (And, these
are really general stats/modelling questions, not so much mixed modeling questions ...) Yes, I suppose it was more of a general stats/modelling questions, I was trying to get my head around the math of ln(L) functions such as AIC and deviance and how it behaves at large sample sizes. I posted it here since I fit a GLMM in R and also because of all the lists I'm on this one seemed to be the only one where such matters are discussed at any level of expertise or interest. I hope it's OK that I did. I work from home in a remote area, and unfortunately don't have the benefit of having statistical colleagues I can discuss these things with. I wish I did!!! It gets rather lonely being the only statistician in town, I tend to get a lot of good natured "there he goes again" looks whenever I get a bit too excited about something statistical and try to share :( Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information.?If you are not the named or intended recipient, please delete this communication and contact us immediately.?Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -----Original Message----- From: Ben Bolker [mailto:bbolker at gmail.com] Sent: Monday, 4 March 2013 11:37 AM To: Chris Howden Cc: Steve Taylor; Emmanuel Curis; r-sig-mixed-models Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size?
On 13-03-03 07:04 PM, Chris Howden wrote:
Thanks for the responses everyone, I agree that its changes of 'goodness of fit' Likelihood functions such as AIC and deviance that matter, not their absolute size. However I think the impact of sample size may be something we need to consider, particularly when analysing "Big Data" sets. I recently did some analysis on "Big Data", the number of rows was over 300 000. What I found was that the Full Model was always selected using AIC, Deviance and LRT. However when I had a look at the effects of the predictors I found some of them were negligible, to the point of not really being worth including in the model. Despite what the AIC and LRT say.
Well, what do you mean by "not really worth including in the model"? The AIC is telling you that they improve the expected predictive accuracy. "Too small to be interesting" is certainly possible, but it's impossible for us to know (without the context of the question and without knowing what question you're trying to answer with the model) whether the effects are or aren't.
This seems to be the same sample size issue faced with simple Univariate tests such as ANOVA i.e. large sample sizes give so much power that statistically significant results may be of no/little
practical value.
The reason I asked about the convergence of deviance and AIC at large sample sizes was thus. The LRT tests between the Full model and 1 less predictor all had exceptionally small p-values, which meant that the difference in Ln(L) was very large.
(I would put this the other way around: deviance/log-likelihood difference is more fundamental than p-value.)
So large that it appears that the difference in deviance and AIC was essentially the same.
Yes, it's true that for a fixed range of model sizes, model complexity matters less and less for large samples.
So although it?s the difference that matters, if they converge at large sample sizes then a large difference in deviance means there will also be a large difference in AIC and they will come to the same
conclusion??
However as they don't converge at small sample sizes this effect is not as relevant.
It's fairly well known, I think, that "everything is significant" for
sufficiently large sample sizes. Arguably (e.g. according to Andrew
Gelman) we should be using hierarchical models to include more and more
structure in our models, so that we are always extracting as much
information as is in the data ...
I'm not really clear on what your question is any more. (And, these are
really general stats/modelling questions, not so much mixed modeling
questions ...)
cheers
Ben Bolker
Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any
attachments without our consent.
Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not
those of the company.
Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -----Original Message----- From: Steve Taylor [mailto:steve.taylor at aut.ac.nz] Sent: Monday, 4 March 2013 10:10 AM To: Emmanuel Curis; Chris Howden Cc: r-sig-mixed-models Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? I agree that it is changes in AIC that matter, not its absolute value. My understanding is that AIC is only useful for comparing two models fitted on the same data set, i.e. with the same sample size. So the question of how AIC changes with sample size is of little use beyond curiosity. The change in AIC caused by adding a term to the model formula would be of interest. But the change in AIC caused by adding cases to the sample size is pretty meaningless. The 2K part is important because it provides a penalty for the change in the number of parameters between a simpler model and a more complex
model.
I would advise against making any approximations when calculating AIC,
especially considering its main use is in taking the difference
between two close large numbers.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of
Emmanuel Curis
Sent: Friday, 1 March 2013 9:18p
To: Chris Howden
Cc: r-sig-mixed-models
Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
deviance, at very large sample size?
Hi,
I may be wrong, but I understood that AIC in itself is not as
important as changes in AIC between models, and some authors says that
changes in AIC in the order of more than 10 are enough to favor a model
on another.
And changes in the 2*k term should be in this order of magnitude when comparing different models. So my guess would be that it remains important. On the other hand, if a set of parameters will remain in all models, it probably can be safely ignored in the 2*k term for all models. Hope this helps, On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote: < Hi everyone, < < Although not strictly an R issue there often seems to be discussions along < these lines on this list, so I hope no one minds me posting
this.
If U do < please let me know. (and just for the record I am applying this in R) < < I'm trying to get my head around AIC and sample size. < < Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking that as the Likelihood is the product of < probabilities then (all else being equal) the larger the sample size the < smaller the
Likelihood?
< Which means that if we have very large sample sizes we expect the -2ln(L) < term to be a very large number? < Which would reduce the effect of the parameter correction term 2K? < < < Chris Howden B.Sc. (Hons) GStat. < Founding Partner < Evidence Based Strategic Development, IP Commercialisation and Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689 945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < <
_______________________________________________ < R-sig-mixed-models at r-project.org mailing list < https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- Emmanuel CURIS emmanuel.curis at parisdescartes.fr Page WWW: http://emmanuel.curis.online.fr/index.html _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I probably should have pointed out that we had other OR's that ranged from approx. 0.5 or 1.5. So in that context 1.02 isn't very strong!! Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information.?If you are not the named or intended recipient, please delete this communication and contact us immediately.?Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -----Original Message----- From: Chris Howden [mailto:chris at trickysolutions.com.au] Sent: Monday, 4 March 2013 12:19 PM To: 'Ben Bolker' Cc: 'Steve Taylor'; 'Emmanuel Curis'; 'r-sig-mixed-models' Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? Thanks for the reply Ben, In response to your specific points
Well, what do you mean by "not really worth including in the model"?
1) In terms of "too small to be interesting" I mean an Odds Ratio of 1.02 (from a GLMM with binomial family and random intercept based on the individuals, each individual having 100's of data points)
It's fairly well known, I think, that "everything is significant" for sufficiently large sample sizes. Arguably (e.g. according to Andrew Gelman) we should be using hierarchical models to include more and more
structure in our models, so that we are always extracting as much information as is >> in the data ... I wish it was well known, but I'm not so sure. One of the reasons I'm asking this is that we are trying to publish our results and the reviewers keep saying things like "If U had used AIC U wouldn't have this problem, so go back and reanalyse it using AIC". We have stated that we have a "ubiquitous significance" problem partially due to sample size and have suggested a way around it that involves interpreting the effect sizes at different spatial scales in order to find those that are useful. But I'm having a hard time convincing people, so I'm trying to put together a mathematical reason, which leads into your next point.
I'm not really clear on what your question is any more. (And, these are really general stats/modelling questions, not so much mixed modeling questions ...)
Yes, I suppose it was more of a general stats/modelling questions, I was trying to get my head around the math of ln(L) functions such as AIC and deviance and how it behaves at large sample sizes. I posted it here since I fit a GLMM in R and also because of all the lists I'm on this one seemed to be the only one where such matters are discussed at any level of expertise or interest. I hope it's OK that I did. I work from home in a remote area, and unfortunately don't have the benefit of having statistical colleagues I can discuss these things with. I wish I did!!! It gets rather lonely being the only statistician in town, I tend to get a lot of good natured "there he goes again" looks whenever I get a bit too excited about something statistical and try to share :( Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information.?If you are not the named or intended recipient, please delete this communication and contact us immediately.?Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -----Original Message----- From: Ben Bolker [mailto:bbolker at gmail.com] Sent: Monday, 4 March 2013 11:37 AM To: Chris Howden Cc: Steve Taylor; Emmanuel Curis; r-sig-mixed-models Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size?
On 13-03-03 07:04 PM, Chris Howden wrote:
Thanks for the responses everyone, I agree that its changes of 'goodness of fit' Likelihood functions such as AIC and deviance that matter, not their absolute size. However I think the impact of sample size may be something we need to consider, particularly when analysing "Big Data" sets. I recently did some analysis on "Big Data", the number of rows was over 300 000. What I found was that the Full Model was always selected using AIC, Deviance and LRT. However when I had a look at the effects of the predictors I found some of them were negligible, to the point of not really being worth including in the model. Despite what the AIC and LRT say.
Well, what do you mean by "not really worth including in the model"? The AIC is telling you that they improve the expected predictive accuracy. "Too small to be interesting" is certainly possible, but it's impossible for us to know (without the context of the question and without knowing what question you're trying to answer with the model) whether the effects are or aren't.
This seems to be the same sample size issue faced with simple Univariate tests such as ANOVA i.e. large sample sizes give so much power that statistically significant results may be of no/little
practical value.
The reason I asked about the convergence of deviance and AIC at large sample sizes was thus. The LRT tests between the Full model and 1 less predictor all had exceptionally small p-values, which meant that the difference in Ln(L) was very large.
(I would put this the other way around: deviance/log-likelihood difference is more fundamental than p-value.)
So large that it appears that the difference in deviance and AIC was essentially the same.
Yes, it's true that for a fixed range of model sizes, model complexity matters less and less for large samples.
So although it?s the difference that matters, if they converge at large sample sizes then a large difference in deviance means there will also be a large difference in AIC and they will come to the same
conclusion??
However as they don't converge at small sample sizes this effect is not as relevant.
It's fairly well known, I think, that "everything is significant" for
sufficiently large sample sizes. Arguably (e.g. according to Andrew
Gelman) we should be using hierarchical models to include more and more
structure in our models, so that we are always extracting as much
information as is in the data ...
I'm not really clear on what your question is any more. (And, these are
really general stats/modelling questions, not so much mixed modeling
questions ...)
cheers
Ben Bolker
Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any
attachments without our consent.
Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not
those of the company.
Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -----Original Message----- From: Steve Taylor [mailto:steve.taylor at aut.ac.nz] Sent: Monday, 4 March 2013 10:10 AM To: Emmanuel Curis; Chris Howden Cc: r-sig-mixed-models Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? I agree that it is changes in AIC that matter, not its absolute value. My understanding is that AIC is only useful for comparing two models fitted on the same data set, i.e. with the same sample size. So the question of how AIC changes with sample size is of little use beyond curiosity. The change in AIC caused by adding a term to the model formula would be of interest. But the change in AIC caused by adding cases to the sample size is pretty meaningless. The 2K part is important because it provides a penalty for the change in the number of parameters between a simpler model and a more complex
model.
I would advise against making any approximations when calculating AIC,
especially considering its main use is in taking the difference
between two close large numbers.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of
Emmanuel Curis
Sent: Friday, 1 March 2013 9:18p
To: Chris Howden
Cc: r-sig-mixed-models
Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
deviance, at very large sample size?
Hi,
I may be wrong, but I understood that AIC in itself is not as
important as changes in AIC between models, and some authors says that
changes in AIC in the order of more than 10 are enough to favor a model
on another.
And changes in the 2*k term should be in this order of magnitude when comparing different models. So my guess would be that it remains important. On the other hand, if a set of parameters will remain in all models, it probably can be safely ignored in the 2*k term for all models. Hope this helps, On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote: < Hi everyone, < < Although not strictly an R issue there often seems to be discussions along < these lines on this list, so I hope no one minds me posting
this.
If U do < please let me know. (and just for the record I am applying this in R) < < I'm trying to get my head around AIC and sample size. < < Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking that as the Likelihood is the product of < probabilities then (all else being equal) the larger the sample size the < smaller the
Likelihood?
< Which means that if we have very large sample sizes we expect the -2ln(L) < term to be a very large number? < Which would reduce the effect of the parameter correction term 2K? < < < Chris Howden B.Sc. (Hons) GStat. < Founding Partner < Evidence Based Strategic Development, IP Commercialisation and Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689 945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < <
_______________________________________________ < R-sig-mixed-models at r-project.org mailing list < https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- Emmanuel CURIS emmanuel.curis at parisdescartes.fr Page WWW: http://emmanuel.curis.online.fr/index.html _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks Luca, A quick look at some of my results suggests that that the same ?large sample size? effects carry through to BIC, with it still selecting the full model. Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. From: lborger [mailto:lborger at cebc.cnrs.fr] Sent: Monday, 4 March 2013 11:30 AM To: Chris Howden; Steve Taylor; Emmanuel Curis; Ben Bolker Cc: r-sig-mixed-models Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? Hello,
I recently did some analysis on "Big Data", the number of rows was over 300 000. What I found was that the Full Model was always selected using AIC, Deviance and LRT. However when I had a look at the effects of the predictors I found some of them were negligible, to the point of not really being worth including in the model.
You might find this one interesting: Link, W. A., and R. J. Barker. 2006. Model weights and the foundations of multimodel inference. Ecology 87:2626-2635. Cheers, Luca ------------------------------------------------------------------ Luca Borger (PhD, MSc, BMus) Centre d'Etudes Biologiques de Chize CNRS (U.P.R. 1934) & INRA (USC 1339) 79360 Villiers-en-Bois, France ***** email: lborger at cebc.cnrs.fr Skype: luca.borger | Tel: +33 (0)549 099613 http://cnrs.academia.edu/LucaBorger http://www.researcherid.com/rid/C-6003-2008 http://www.cebc.cnrs.fr/Fidentite/borger/borger.htm ------------------------------------------------------------------ * new book chapter: Borger & Fryxell (2012) Quantifying individual differences in dispersal using the net squared displacement statistics. Ch. 17 In: Dispersal Ecology and Evolution. Editors: Clobert J., Baguette M., Benton T., Bullock J. Oxford University Press, Oxford (UK). - -----Original Message----- From: Chris Howden <chris at trickysolutions.com.au> To: Steve Taylor <steve.taylor at aut.ac.nz>, Emmanuel Curis <emmanuel.curis at parisdescartes.fr>, Ben Bolker <bbolker at gmail.com> Cc: r-sig-mixed-models <r-sig-mixed-models at r-project.org> Date: Mon, 4 Mar 2013 11:04:01 +1100 Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? Thanks for the responses everyone, I agree that its changes of 'goodness of fit' Likelihood functions such as AIC and deviance that matter, not their absolute size. However I think the impact of sample size may be something we need to consider, particularly when analysing "Big Data" sets. I recently did some analysis on "Big Data", the number of rows was over 300 000. What I found was that the Full Model was always selected using AIC, Deviance and LRT. However when I had a look at the effects of the predictors I found some of them were negligible, to the point of not really being worth including in the model. Despite what the AIC and LRT say. This seems to be the same sample size issue faced with simple Univariate tests such as ANOVA i.e. large sample sizes give so much power that statistically significant results may be of no/little practical value. The reason I asked about the convergence of deviance and AIC at large sample sizes was thus. The LRT tests between the Full model and 1 less predictor all had exceptionally small p-values, which meant that the difference in Ln(L) was very large. So large that it appears that the difference in deviance and AIC was essentially the same. So although it?s the difference that matters, if they converge at large sample sizes than a large difference in deviance means there will also be a large difference in AIC and they will come to the same conclusion?? However as they don't converge at small sample sizes this effect is not as relevant. Chris Howden B.Sc. (Hons) GStat. Founding Partner Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax) +612 4782 9023 chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. -----Original Message----- From: Steve Taylor [mailto:steve.taylor at aut.ac.nz] Sent: Monday, 4 March 2013 10:10 AM To: Emmanuel Curis; Chris Howden Cc: r-sig-mixed-models Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? I agree that it is changes in AIC that matter, not its absolute value. My understanding is that AIC is only useful for comparing two models fitted on the same data set, i.e. with the same sample size. So the question of how AIC changes with sample size is of little use beyond curiosity. The change in AIC caused by adding a term to the model formula would be of interest. But the change in AIC caused by adding cases to the sample size is pretty meaningless. The 2K part is important because it provides a penalty for the change in the number of parameters between a simpler model and a more complex model. I would advise against making any approximations when calculating AIC, especially considering its main use is in taking the difference between two close large numbers. cheers, Steve -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel Curis Sent: Friday, 1 March 2013 9:18p To: Chris Howden Cc: r-sig-mixed-models Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size? Hi, I may be wrong, but I understood that AIC in itself is not as important as changes in AIC between models, and some authors says that changes in AIC in the order of more than 10 are enough to favor a model on another. And changes in the 2*k term should be in this order of magnitude when comparing different models. So my guess would be that it remains important. On the other hand, if a set of parameters will remain in all models, it probably can be safely ignored in the 2*k term for all models. Hope this helps,
On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote:
< Hi everyone, < < Although not strictly an R issue there often seems to be discussions along < these lines on this list, so I hope no one minds me posting this. If U do < please let me know. (and just for the record I am applying this in R) < < I'm trying to get my head around AIC and sample size. < < Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking that as the Likelihood is the product of < probabilities then (all else being equal) the larger the sample size the < smaller the Likelihood? < Which means that if we have very large sample sizes we expect the -2ln(L) < term to be a very large number? < Which would reduce the effect of the parameter correction term 2K? < < < Chris Howden B.Sc. (Hons) GStat. < Founding Partner < Evidence Based Strategic Development, IP Commercialisation and Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689 945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < < _______________________________________________ < R-sig-mixed-models at r-project.org mailing list < https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- Emmanuel CURIS emmanuel.curis at parisdescartes.fr Page WWW: http://emmanuel.curis.online.fr/index.html _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi, It is probably a pointless remark, but if for instance your 1.02 OR is for age expressed in years, then a 10-year older individual will have an OR of 1.02^10 = 1.2 and a 20 years older one an OR of 1.49, which is not so negligible compared to your 1.5 OR for, let's say, sex... In other words, difficult to judge on only the values not knowing the context and the variables... But that certainly does not help for your matter. May be, by trying to generalize the "equivalence tests", you may construct a kind of test to select OR only if proven of higher importance than a given cutoff, based on clinical/practical considerations (or, conversingly, to prove that this OR is of not practical importance) --- but may be also methodologically difficult if the cutoff is selected after analysis. In other words, may be the question is not well translated as a question on significance/difference tests? Hope this hints may help...
On Mon, Mar 04, 2013 at 12:59:05PM +1100, Chris Howden wrote:
? I probably should have pointed out that we had other OR's that ranged from ? approx. 0.5 or 1.5. So in that context 1.02 isn't very strong!!
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html