Skip to content

logLik.lm()

5 messages · Edward Dick, Achim Zeileis, Spencer Graves +1 more

#
Hello,

I'm trying to use AIC to choose between 2 models with
positive, continuous response variables and different error
distributions (specifically a Gamma GLM with log link and a
normal linear model for log(y)). I understand that in some
cases it may not be possible (or necessary) to discriminate
between these two distributions. However, for the normal
linear model I noticed a discrepancy between the output of
the AIC() function and my calculations done "by hand."
This is due to the output from the function logLik.lm(),
which does not match my results using the dnorm() function
(see simple regression example below).

x <- c(43.22,41.11,76.97,77.67,124.77,110.71,144.46,188.05,171.18,
       204.92,221.09,178.21,224.61,286.47,249.92,313.19,332.17,374.35)
y <- c(5.18,12.47,15.65,23.42,27.07,34.84,31.03,30.87,40.07,57.36,
       47.68,43.40,51.81,55.77,62.59,66.56,74.65,73.54)
test.lm <- lm(y~x)
y.hat <- fitted(test.lm)
sigma <- summary(test.lm)$sigma
logLik(test.lm)
# `log Lik.' -57.20699 (df=3)
sum(dnorm(y, y.hat, sigma, log=T))
# [1] -57.26704

The difference in this simple example is slight, but
it is magnified when using my data. My understanding is
that it is necessary to use the complete likelihood functions
for both the Gamma model and the 'lognormal' model (no constants
removed, etc.) in order to accurately compare using AIC.

Can someone point out my error, or explain this discrepancy?

Thanks in advance!
#
On Wednesday 25 June 2003 20:23, Edward Dick wrote:

            
That is because you are not using the ML estimate of the variance.

sigmaML <- sqrt(mean(residuals(test.lm)^2))
sum(dnorm(y, y.hat, sigmaML, log=T))
# [1] -57.20699

Best,
Z
#
Your by-hand calculation is wrong -- you have to use the MLE of sigma^2.

sum(dnorm(y, y.hat, sigma * sqrt(16/18), log=TRUE))

Also, this is an inappropriate use of AIC: the models are not nested, and
Akaike only proposed it for nested models.  Next, the gamma GLM is not a
maximum-likelihood fit unless the shape parameter is known, so you can't
use AIC with such a model using the dispersion estimate of shape

The AIC output from glm() is incorrect (even in that case, since the
shape is always estimated by the dispersion).
On Wed, 25 Jun 2003, Edward Dick wrote:

            

  
    
#
Dear Prof. Ripley:

	  I gather you disagree with the observation in Burnham and Anderson 
(2002, ch. 2) that the "complexity penalty" in the Akaike Information 
Criterion is a bias correction, and with this correction, they can use 
"density = exp(-AIC/2)" to compute approximate posterior probabilities 
comparing even different distributions?

	  They use this even to compare discrete and continuous distributions, 
which makes no sense to me.  However, with a common dominating measure, 
it seems sensible to me.  They cite a growing literature on "Bayesian 
model averaging".  What I've seen of this claims that Bayesian model 
averaging produces better predictions than predictions based on any 
single model, even using these approximate posteriors ("Akaike weights") 
in place of full Bayesian posteriors.

	  I don't have much experience with this, but so far, I seem to have 
gotten great, informative answers to my clients' questions.  If there 
are serious deficiencies with this kind of procedure, I'd like to know.

Comments?
Best Wishes,
Spencer Graves
###############
REFERENCE:  Burnam and Anderson (2002) Model Selection and Multimodel 
Inference, 2nd ed. (Springer)
Prof Brian Ripley wrote:
#
On Wed, 25 Jun 2003, Spencer Graves wrote:

            
That's the derivation of BIC and similar, not AIC.
Yes, model averaging is useful, but is nothing to do with AIC nor Burnham
& Anderson.  See e.g. my PRNN book for better ways to do it.

Burnham & Anderson (2002) is a book I would recommend people NOT to read
until they have read the primary literature.  I see no evidence that the 
authors have actually read Akaike's papers.