understanding log-likelihood/model fit
On Thu, Aug 21, 2008 at 7:35 AM, Daniel Ezra Johnson
<danielezrajohnson at gmail.com> wrote:
Thank you all for your help. I'm now referring back to the discussion in Chapter 2 of Pinheiro and Bates and understanding this much better. Well, a little better. In the figures on pp. 73-74, the middle panels (log-residual norm) seem to illustrate what Douglas Bates has described here as
"the penalty depend[ing] on the (unconditional) variance covariance matrix of the random effects. When the variances are small there is a large penalty. When the variances are large there is a small penalty on the size of the random effects."
And the bottom panels (log-determinant ratio) seem to illustrate
The measure of model complexity, which is related to the determinant of the conditional variance of the random effects, given the data, [which] has the opposite behavior. When the variance of the random effects is small the model is considered simpler. The simplest possible model on this scale is one without any random effects at all, corresponding to a variance of zero.
In these charts, as you move all the way to the right, in the limit, the values of Delta and theta are maximized, which I believe means the random effect variance goes to zero (with respect to the residual variance). As you move to the left, your model complexity gets worse, but your model fidelity improves for a time, and that's where you get the maximum log-likelihood (top panel). If theta going to infinity represents zero random effects, could you say that theta going to zero represents random effects that are no longer distinguishable from fixed effects?
Yes, in the sense that the residual sum of squares is what you would obtain from a fixed-effects fit where the model matrix is the [X,Z] where X is the model matrix for the fixed-effects parameters and Z is the model matrix for the random effects. The parameter estimates for this model may not be well defined because the model could be overparameterized. However, the residual sum of squares for this model is well-defined, as are the predictions and the residuals. You could produce a similar plot for an lmer model fit. Consider the Dyestuff data in current versions of the lme4 package. The REML criterion is similar to the maximum likelihood but it involves another term so let's just consider the deviance and not the REML deviance.
options(show.signif.stars = FALSE) (fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff, REML = FALSE, verbose = TRUE))
0: 327.33176: 0.730297
1: 327.33082: 0.772944
2: 327.32706: 0.753271
3: 327.32706: 0.752556
4: 327.32706: 0.752581
Linear mixed model fit by maximum likelihood
Formula: Yield ~ 1 + (1 | Batch)
Data: Dyestuff
AIC BIC logLik deviance REMLdev
333.3 337.5 -163.7 327.3 319.7
Random effects:
Groups Name Variance Std.Dev.
Batch (Intercept) 1388.1 37.258
Residual 2451.3 49.511
Number of obs: 30, groups: Batch, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 1527.50 17.69 86.33
The verbose output gives the profiled deviance at each iteration as a
function of a single parameter, which is the ratio of the standard
deviation of the random effects to the residual standard deviation.
37.258/49.511
[1] 0.7525196 This parameter is stored in the ST slot. The slot is a list with elements corresponding to the random effects terms in the model. In this model there is only one random effects term, (1|Batch). In general the elements of the list are square matrices where the number of rows and columns corresponds to the number of random effects from that term for each level of the grouping factor. In this case there is only one random effect per level of the factor.
fm1 at ST
[[1]]
(Intercept)
(Intercept) 0.7525135
The deviance slot contains the deviance itself and several of its components.
fm1 at deviance
ML REML ldL2 ldRX2 sigmaML sigmaREML
327.327060 319.725920 8.059354 2.057972 49.510754 50.357153
pwrss disc usqr wrss
73539.442307 62669.199627 10870.242680 62669.199627
For a linear mixed model, the discrepancy (disc) at the current
parameter values is equal to the weighted residual sum of squares
(wrss). The random effects, b, are linearly transformed to u, for
which the unconditional distribution is a "spherical"
Gaussian distribution. (i.e. the variance-covariance matrix is
\sigma^2 times the q by q identity matrix, where \sigma^2 is the same
scalar variance as in the definition of the conditional distribution
of Y|B). The penalty on the size of the random effects is therefore
the squared length of u (usqr). The penalized, weighted residual sum
of squares is the sum of wrss and usqr. The conditional estimate of
\sigma^2 is pwrss/n and sigmaML is its square root.
d <- fm1 at deviance all.equal(unname(d['pwrss']), unname(d['wrss'] + d['usqr']))
[1] TRUE
fm1 at dims['n']
n 30
all.equal(unname(d['sigmaML']), unname(sqrt(d['pwrss']/30)))
[1] TRUE The log-determinant of the conditional variance-covariance of the random effects, given the data (i.e. B|Y) is, up to the scale factor of \sigma, ldL2. The profiled deviance is calculated as shown in slide 132 of http://www.stat.wisc.edu/~bates/PotsdamGLMM/LMMD.pdf The actual code in the C function lmm_update_fixef_u in lme4/src/lmer.c is d[ML_POS] = d[ldL2_POS] + dn * (1 + log(d[pwrss_POS]) + log(2 * PI / dn)); or, in R,
unname(d['ldL2'] + 30 * (1 + log(2 * pi * d['pwrss']/30)))
[1] 327.3271
all.equal(unname(d['ML']), unname(d['ldL2'] + 30 * (1 + log(2 * pi * d['pwrss']/30))))
[1] TRUE Next I was going to show how to change the value of the ST parameter and update the deviance slot but in the process I discovered that one of the C functions (the aforementioned lmm_update_fixef_u) cannot be called from R. After changing that and making a new release I can show how to create a similar figure in the new formulation.