Message-ID: <5.1.0.14.2.20030612162725.01fbccf8@mcmail.cis.mcmaster.ca>
Date: 2003-06-12T20:58:54Z
From: John Fox
Subject: What PRECISELY is the dfbetas() or lm.influence()$coef ?
In-Reply-To: <9D7EF737FA4C6F4FBBFC52FC30B83690D49F55@nihexchange7.nih.go v>
Dear Hormuzd,
At 01:24 PM 6/12/2003 -0400, Katki, Hormuzd (NIH/NCI) wrote:
> Hello. I want to get the proper influence function for the glm
>coefficients in R. This is supposed to be inv(information)*(y-yhat)*x. So
>I am wondering what is the exact mathematical formula for the output that
>the functions:
>
>dfbeta() OR lm.influence()$coefficients
>
>return for a glm model. I am confused because:
>
>1. Their columns don't sum to zero as influences should.
Even in a linear model, where the computation is exact, this isn't the
case, if influence is defined as the change in the coefficients upon
deleting each observation in turn (i.e., as dfbeta).
>2. They return different "influences", so the 2 functions are doing
>something different.
That's odd. I believe that dfbeta() for a GLM simply uses influence.glm,
which has the same $coefficients component as lm.influence. As such, for a
GLM, both are based on the last step of the IRLS fit -- i.e., a
linearization of the model.
>3. I think they divide each element by the standard error of the
>corresponding coefficient, but that's not enough to resolve any
>discrepancies
Perhaps you meant that dfbetas() [not dfbeta()] returns different values
from lm.influence()$coef (as in your subject line)? dfbetas standardizes
the coefficient changes by coefficient standard errors, using a deleted
estimate of the dispersion parameter.
>The documentation doesn't provide any details. Any help would be greatly
>appreciated.
I hope that this helps,
John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox