Hello,
I'm trying to use AIC to choose between 2 models with
positive, continuous response variables and different error
distributions (specifically a Gamma GLM with log link and a
normal linear model for log(y)). I understand that in some
cases it may not be possible (or necessary) to discriminate
between these two distributions. However, for the normal
linear model I noticed a discrepancy between the output of
the AIC() function and my calculations done "by hand."
This is due to the output from the function logLik.lm(),
which does not match my results using the dnorm() function
(see simple regression example below).
x <- c(43.22,41.11,76.97,77.67,124.77,110.71,144.46,188.05,171.18,
204.92,221.09,178.21,224.61,286.47,249.92,313.19,332.17,374.35)
y <- c(5.18,12.47,15.65,23.42,27.07,34.84,31.03,30.87,40.07,57.36,
47.68,43.40,51.81,55.77,62.59,66.56,74.65,73.54)
test.lm <- lm(y~x)
y.hat <- fitted(test.lm)
sigma <- summary(test.lm)$sigma
logLik(test.lm)
# `log Lik.' -57.20699 (df=3)
sum(dnorm(y, y.hat, sigma, log=T))
# [1] -57.26704
The difference in this simple example is slight, but
it is magnified when using my data. My understanding is
that it is necessary to use the complete likelihood functions
for both the Gamma model and the 'lognormal' model (no constants
removed, etc.) in order to accurately compare using AIC.
Can someone point out my error, or explain this discrepancy?
Thanks in advance!
logLik.lm()
5 messages · Edward Dick, Achim Zeileis, Spencer Graves +1 more
On Wednesday 25 June 2003 20:23, Edward Dick wrote:
Hello,
I'm trying to use AIC to choose between 2 models with
positive, continuous response variables and different error
distributions (specifically a Gamma GLM with log link and a
normal linear model for log(y)). I understand that in some
cases it may not be possible (or necessary) to discriminate
between these two distributions. However, for the normal
linear model I noticed a discrepancy between the output of
the AIC() function and my calculations done "by hand."
This is due to the output from the function logLik.lm(),
which does not match my results using the dnorm() function
(see simple regression example below).
x <- c(43.22,41.11,76.97,77.67,124.77,110.71,144.46,188.05,171.18,
204.92,221.09,178.21,224.61,286.47,249.92,313.19,332.17,374.35) y <-
c(5.18,12.47,15.65,23.42,27.07,34.84,31.03,30.87,40.07,57.36,
47.68,43.40,51.81,55.77,62.59,66.56,74.65,73.54)
test.lm <- lm(y~x)
y.hat <- fitted(test.lm)
sigma <- summary(test.lm)$sigma
logLik(test.lm)
# `log Lik.' -57.20699 (df=3)
sum(dnorm(y, y.hat, sigma, log=T))
# [1] -57.26704
The difference in this simple example is slight, but
it is magnified when using my data.
That is because you are not using the ML estimate of the variance. sigmaML <- sqrt(mean(residuals(test.lm)^2)) sum(dnorm(y, y.hat, sigmaML, log=T)) # [1] -57.20699 Best, Z
My understanding is that it is necessary to use the complete likelihood functions for both the Gamma model and the 'lognormal' model (no constants removed, etc.) in order to accurately compare using AIC. Can someone point out my error, or explain this discrepancy? Thanks in advance!
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Your by-hand calculation is wrong -- you have to use the MLE of sigma^2. sum(dnorm(y, y.hat, sigma * sqrt(16/18), log=TRUE)) Also, this is an inappropriate use of AIC: the models are not nested, and Akaike only proposed it for nested models. Next, the gamma GLM is not a maximum-likelihood fit unless the shape parameter is known, so you can't use AIC with such a model using the dispersion estimate of shape The AIC output from glm() is incorrect (even in that case, since the shape is always estimated by the dispersion).
On Wed, 25 Jun 2003, Edward Dick wrote:
Hello,
I'm trying to use AIC to choose between 2 models with
positive, continuous response variables and different error
distributions (specifically a Gamma GLM with log link and a
normal linear model for log(y)). I understand that in some
cases it may not be possible (or necessary) to discriminate
between these two distributions. However, for the normal
linear model I noticed a discrepancy between the output of
the AIC() function and my calculations done "by hand."
This is due to the output from the function logLik.lm(),
which does not match my results using the dnorm() function
(see simple regression example below).
x <- c(43.22,41.11,76.97,77.67,124.77,110.71,144.46,188.05,171.18,
204.92,221.09,178.21,224.61,286.47,249.92,313.19,332.17,374.35)
y <- c(5.18,12.47,15.65,23.42,27.07,34.84,31.03,30.87,40.07,57.36,
47.68,43.40,51.81,55.77,62.59,66.56,74.65,73.54)
test.lm <- lm(y~x)
y.hat <- fitted(test.lm)
sigma <- summary(test.lm)$sigma
logLik(test.lm)
# `log Lik.' -57.20699 (df=3)
sum(dnorm(y, y.hat, sigma, log=T))
# [1] -57.26704
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Prof. Ripley:
I gather you disagree with the observation in Burnham and Anderson
(2002, ch. 2) that the "complexity penalty" in the Akaike Information
Criterion is a bias correction, and with this correction, they can use
"density = exp(-AIC/2)" to compute approximate posterior probabilities
comparing even different distributions?
They use this even to compare discrete and continuous distributions,
which makes no sense to me. However, with a common dominating measure,
it seems sensible to me. They cite a growing literature on "Bayesian
model averaging". What I've seen of this claims that Bayesian model
averaging produces better predictions than predictions based on any
single model, even using these approximate posteriors ("Akaike weights")
in place of full Bayesian posteriors.
I don't have much experience with this, but so far, I seem to have
gotten great, informative answers to my clients' questions. If there
are serious deficiencies with this kind of procedure, I'd like to know.
Comments?
Best Wishes,
Spencer Graves
###############
REFERENCE: Burnam and Anderson (2002) Model Selection and Multimodel
Inference, 2nd ed. (Springer)
Prof Brian Ripley wrote:
Your by-hand calculation is wrong -- you have to use the MLE of sigma^2. sum(dnorm(y, y.hat, sigma * sqrt(16/18), log=TRUE)) Also, this is an inappropriate use of AIC: the models are not nested, and Akaike only proposed it for nested models. Next, the gamma GLM is not a maximum-likelihood fit unless the shape parameter is known, so you can't use AIC with such a model using the dispersion estimate of shape The AIC output from glm() is incorrect (even in that case, since the shape is always estimated by the dispersion). On Wed, 25 Jun 2003, Edward Dick wrote:
Hello,
I'm trying to use AIC to choose between 2 models with
positive, continuous response variables and different error
distributions (specifically a Gamma GLM with log link and a
normal linear model for log(y)). I understand that in some
cases it may not be possible (or necessary) to discriminate
between these two distributions. However, for the normal
linear model I noticed a discrepancy between the output of
the AIC() function and my calculations done "by hand."
This is due to the output from the function logLik.lm(),
which does not match my results using the dnorm() function
(see simple regression example below).
x <- c(43.22,41.11,76.97,77.67,124.77,110.71,144.46,188.05,171.18,
204.92,221.09,178.21,224.61,286.47,249.92,313.19,332.17,374.35)
y <- c(5.18,12.47,15.65,23.42,27.07,34.84,31.03,30.87,40.07,57.36,
47.68,43.40,51.81,55.77,62.59,66.56,74.65,73.54)
test.lm <- lm(y~x)
y.hat <- fitted(test.lm)
sigma <- summary(test.lm)$sigma
logLik(test.lm)
# `log Lik.' -57.20699 (df=3)
sum(dnorm(y, y.hat, sigma, log=T))
# [1] -57.26704
On Wed, 25 Jun 2003, Spencer Graves wrote:
Dear Prof. Ripley: I gather you disagree with the observation in Burnham and Anderson (2002, ch. 2) that the "complexity penalty" in the Akaike Information Criterion is a bias correction, and with this correction, they can use "density = exp(-AIC/2)" to compute approximate posterior probabilities comparing even different distributions?
That's the derivation of BIC and similar, not AIC.
They use this even to compare discrete and continuous distributions,
which makes no sense to me. However, with a common dominating measure,
it seems sensible to me. They cite a growing literature on "Bayesian
model averaging". What I've seen of this claims that Bayesian model
averaging produces better predictions than predictions based on any
single model, even using these approximate posteriors ("Akaike weights")
in place of full Bayesian posteriors.
I don't have much experience with this, but so far, I seem to have
gotten great, informative answers to my clients' questions. If there
are serious deficiencies with this kind of procedure, I'd like to know.
Yes, model averaging is useful, but is nothing to do with AIC nor Burnham & Anderson. See e.g. my PRNN book for better ways to do it. Burnham & Anderson (2002) is a book I would recommend people NOT to read until they have read the primary literature. I see no evidence that the authors have actually read Akaike's papers.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595