Standardized Pearson residuals

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20110315/79337026/attachment.pl>

Peter Dalgaard: It would also be nice for teaching purposes if glm or summary.glm had a
"pearsonchisq" component and a corresponding extractor function, but I
can imagine that there might be arguments against it that haven't
occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
to repair a bug. If I'm wrong about all that though, ...

Umm, that was Brett, actually.
This would remedy what I have long judged a deficiency in summary.glm().
The information is important for diagnostic purposes.  One should not have
to fit a model with a quasi error, or suss out how to calculate the Pearson
chi square from the glm model object, to discover that the information in the
model object is inconsistent with simple binomial or poisson assumptions.
It could be somewhere between useless and misleading in cases like binary logistic regression though. (Same thing goes for the test against the saturated model: Sometimes it makes sense and sometimes not.)
John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 15/03/2011, at 10:00 PM, r-devel-request at r-project.org wrote:

From: Brett Presnell <presnell at stat.ufl.edu>
Date: 15 March 2011 2:40:29 PM AEDT
To: peter dalgaard <pdalgd at gmail.com>
Cc: r-devel at r-project.org
Subject: Re: [Rd] Standardized Pearson residuals

Thanks Peter.  I have just a couple of minor comments, and another
possible feature request, although it's one that I don't think will be
implemented.

peter dalgaard <pdalgd at gmail.com> writes:

On Mar 14, 2011, at 22:25 , Brett Presnell wrote:

Is there any reason that rstandard.glm doesn't have a "pearson" option?
And if not, can it be added?
Probably... I have been wondering about that too. I'm even puzzled why
it isn't the default. Deviance residuals don't have quite the
properties that one might expect, e.g. in this situation, the absolute
residuals sum pairwise to zero, so you'd expect that the standardized
residuals be identical in absolute value

y <- 1:4
r <- c(0,0,1,1)
c <- c(0,1,0,1)
rstandard(glm(y~r+c,poisson))
       1          2          3          4 
-0.2901432  0.2767287  0.2784603 -0.2839995 

in comparison,

i <- influence(glm(y~r+c,poisson))
i$pear.res/sqrt(1-i$hat)
       1          2          3          4 
-0.2817181  0.2817181  0.2817181 -0.2817181 

The only thing is that I'm always wary of tampering with this stuff,
for fear of finding out the hard way why thing are the way they
are....
I'm sure that's wise, but it would be nice to get it in as an option,
even if it's not the default

Background: I'm currently teaching an undergrad/grad-service course from
Agresti's "Introduction to Categorical Data Analysis (2nd edn)" and
deviance residuals are not used in the text.  For now I'll just provide
the students with a simple function to use, but I prefer to use R's
native capabilities whenever possible.
Incidentally, chisq.test will have a stdres component in 2.13.0 for
much the same reason.
Thank you.  That's one more thing I won't have to provide code for
anymore.  Coincidentally, Agresti mentioned this to me a week or two ago
as something that he felt was missing, so that's at least two people who
will be happy to see this added.

It would also be nice for teaching purposes if glm or summary.glm had a
"pearsonchisq" component and a corresponding extractor function, but I
can imagine that there might be arguments against it that haven't
occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
to repair a bug. If I'm wrong about all that though, ...

BTW, as I go along I'm trying to collect a lot of the datasets from the
examples and exercises in the text into an R package ("icda").  It's far
from complete and what is there needed tidying up, but I hope to
eventually to round it into shape and put it on CRAN, assuming that
Agresti approves and that there are no copyright issues.

I think something along the following lines should do it:

rstandard.glm <-
function(model,
        infl=influence(model, do.coef=FALSE),
        type=c("deviance", "pearson"), ...)
{
type <- match.arg(type)
res <- switch(type, pearson = infl$pear.res, infl$dev.res)
res <- res/sqrt(1-infl$hat)
res[is.infinite(res)] <- NaN
res
}

	[[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
One can easily test for the binary case and not give the statistic in that case.

A general point is that if one gave no output that was not open to abuse,
there'd be nothing given at all!  One would not be giving any output at all
from poisson or binomial models, given that data that really calls for 
quasi links (or a glmm with observation level random effects) is in my
experience the rule rather than the exception!

At the very least, why not a function dispersion() or pearsonchisquare()
that gives this information.

Apologies that I misattributed this.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On Mar 15, 2011, at 13:42 , John Maindonald wrote:

Peter Dalgaard: It would also be nice for teaching purposes if glm or summary.glm had a
"pearsonchisq" component and a corresponding extractor function, but I
can imagine that there might be arguments against it that haven't
occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
to repair a bug. If I'm wrong about all that though, ...

Umm, that was Brett, actually.
This would remedy what I have long judged a deficiency in summary.glm().
The information is important for diagnostic purposes.  One should not have
to fit a model with a quasi error, or suss out how to calculate the Pearson
chi square from the glm model object, to discover that the information in the
model object is inconsistent with simple binomial or poisson assumptions.
It could be somewhere between useless and misleading in cases like binary logistic regression though. (Same thing goes for the test against the saturated model: Sometimes it makes sense and sometimes not.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 15/03/2011, at 10:00 PM, r-devel-request at r-project.org wrote:

From: Brett Presnell <presnell at stat.ufl.edu>
Date: 15 March 2011 2:40:29 PM AEDT
To: peter dalgaard <pdalgd at gmail.com>
Cc: r-devel at r-project.org
Subject: Re: [Rd] Standardized Pearson residuals

Thanks Peter.  I have just a couple of minor comments, and another
possible feature request, although it's one that I don't think will be
implemented.

peter dalgaard <pdalgd at gmail.com> writes:

On Mar 14, 2011, at 22:25 , Brett Presnell wrote:

Is there any reason that rstandard.glm doesn't have a "pearson" option?
And if not, can it be added?
Probably... I have been wondering about that too. I'm even puzzled why
it isn't the default. Deviance residuals don't have quite the
properties that one might expect, e.g. in this situation, the absolute
residuals sum pairwise to zero, so you'd expect that the standardized
residuals be identical in absolute value

y <- 1:4
r <- c(0,0,1,1)
c <- c(0,1,0,1)
rstandard(glm(y~r+c,poisson))
      1          2          3          4 
-0.2901432  0.2767287  0.2784603 -0.2839995 

in comparison,

i <- influence(glm(y~r+c,poisson))
i$pear.res/sqrt(1-i$hat)
      1          2          3          4 
-0.2817181  0.2817181  0.2817181 -0.2817181 

The only thing is that I'm always wary of tampering with this stuff,
for fear of finding out the hard way why thing are the way they
are....
I'm sure that's wise, but it would be nice to get it in as an option,
even if it's not the default

Background: I'm currently teaching an undergrad/grad-service course from
Agresti's "Introduction to Categorical Data Analysis (2nd edn)" and
deviance residuals are not used in the text.  For now I'll just provide
the students with a simple function to use, but I prefer to use R's
native capabilities whenever possible.
Incidentally, chisq.test will have a stdres component in 2.13.0 for
much the same reason.
Thank you.  That's one more thing I won't have to provide code for
anymore.  Coincidentally, Agresti mentioned this to me a week or two ago
as something that he felt was missing, so that's at least two people who
will be happy to see this added.

It would also be nice for teaching purposes if glm or summary.glm had a
"pearsonchisq" component and a corresponding extractor function, but I
can imagine that there might be arguments against it that haven't
occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
to repair a bug. If I'm wrong about all that though, ...

BTW, as I go along I'm trying to collect a lot of the datasets from the
examples and exercises in the text into an R package ("icda").  It's far
from complete and what is there needed tidying up, but I hope to
eventually to round it into shape and put it on CRAN, assuming that
Agresti approves and that there are no copyright issues.

I think something along the following lines should do it:

rstandard.glm <-
function(model,
       infl=influence(model, do.coef=FALSE),
       type=c("deviance", "pearson"), ...)
{
type <- match.arg(type)
res <- switch(type, pearson = infl$pear.res, infl$dev.res)
res <- res/sqrt(1-infl$hat)
res[is.infinite(res)] <- NaN
res
}

	[[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

One can easily test for the binary case and not give the statistic in that case.
Warning if expected cell counts < 5 would be another possibility.
A general point is that if one gave no output that was not open to abuse,
there'd be nothing given at all!  One would not be giving any output at all
from poisson or binomial models, given that data that really calls for 
quasi links (or a glmm with observation level random effects) is in my
experience the rule rather than the exception!
Hmmm. Not sure I agree on that entirely, but that's a different discussion.
At the very least, why not a function dispersion() or pearsonchisquare()
that gives this information.
Lots of options here.... Offhand, my preference would go to something like
anova(..., test="score") and/or an extra line in summary(). It's not a computationally intensive item as far as I can see, it's more about "output real estate" -- how "SAS-like" do we want to become?
Apologies that I misattributed this.
Never mind...

Back to the original question: 

The current rstandard() code reads

## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
    res <- infl$wt.res # = "dev.res"  really
    res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
    res[is.infinite(res)] <- NaN
    res
}

which is "svn blame" to ripley but that is due to the 2003 code reorganization (except for the infinity check from 2005). So apparently, we have had that FIXME since forever... and finding its author appears to be awkward (Maechler, perhaps?).

I did try Bretts code in lieu of the above (with a mod to handle $dispersion) and even switched the default to use the Pearson residuals. Make check-devel sailed straight through apart from the obvious code/doc mismatch, so we don't have any checks in place nor any examples using rstandard(). I rather strongly suspect that there aren't many user codes using it either.

It is quite tempting simply to commit the change (after updating the docs). One thing holding me back though: I don't know what "the literature" refers to.
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
peter dalgaard <pdalgd at gmail.com>
    on Thu, 17 Mar 2011 15:45:01 +0100 writes:

>> One can easily test for the binary case and not give the
    >> statistic in that case.

    > Warning if expected cell counts < 5 would be another
    > possibility.

    >> 
    >> A general point is that if one gave no output that was not open
    >> to abuse, there'd be nothing given at all!  One would not be
    >> giving any output at all from poisson or binomial models, given
    >> that data that really calls for quasi links (or a glmm with
    >> observation level random effects) is in my experience the rule
    >> rather than the exception!

    > Hmmm. Not sure I agree on that entirely, but that's a different
    > discussion.

    >> At the very least, why not a function dispersion() or
    >> pearsonchisquare() that gives this information.

    > Lots of options here.... Offhand, my preference would go to
    > something like anova(..., test="score") and/or an extra line in
    > summary(). It's not a computationally intensive item as far as I
    > can see, it's more about "output real estate" -- how "SAS-like"
    > do we want to become?

    >> Apologies that I misattributed this.

    > Never mind...

    > Back to the original question:

    > The current rstandard() code reads

## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
    res <- infl$wt.res # = "dev.res"  really
    res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
    res[is.infinite(res)] <- NaN
    res
}

    > which is "svn blame" to ripley but that is due to the 2003
    > code reorganization (except for the infinity check from
    > 2005). So apparently, we have had that FIXME since
    > forever... and finding its author appears to be awkward
    > (Maechler, perhaps?).

yes, almost surely

    > I did try Bretts code in lieu of the above (with a mod to
    > handle $dispersion) and even switched the default to use
    > the Pearson residuals. Make check-devel sailed straight
    > through apart from the obvious code/doc mismatch, so we
    > don't have any checks in place nor any examples using
    > rstandard(). I rather strongly suspect that there aren't
    > many user codes using it either.

    > It is quite tempting simply to commit the change (after
    > updating the docs). One thing holding me back though: I
    > don't know what "the literature" refers to.

well, "the relevant publications on the topic" ...
and now define that (e.g. using the three 'References' on the
help page).
Really, that's what I think I meant when I (think I) wrote that FIXME.
The point then I think was that we had code "donations", and they
partly were clearly providing functionality that was (tested)
"correct" (according to e.g. McCoullagh & Nelder and probably
another one or two text books I would have consulted ... no
large Wikipedia back then), 
but also provided things for which there was nothing in "the
literature", but as the author provided them with other good
code, we would have put it in, as well....
== my vague recollection from the past

Martin

    > -- 
    > Peter Dalgaard Center for Statistics, Copenhagen Business
    > School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv:
    > PDalgd at gmail.com

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

peter dalgaard <pdalgd at gmail.com>
   on Thu, 17 Mar 2011 15:45:01 +0100 writes:

Back to the original question:

The current rstandard() code reads
## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
   res <- infl$wt.res # = "dev.res"  really
   res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
   res[is.infinite(res)] <- NaN
   res
}

which is "svn blame" to ripley but that is due to the 2003
code reorganization (except for the infinity check from
2005). So apparently, we have had that FIXME since
forever... and finding its author appears to be awkward
(Maechler, perhaps?).
yes, almost surely

I did try Bretts code in lieu of the above (with a mod to
handle $dispersion) and even switched the default to use
the Pearson residuals. Make check-devel sailed straight
through apart from the obvious code/doc mismatch, so we
don't have any checks in place nor any examples using
rstandard(). I rather strongly suspect that there aren't
many user codes using it either.

It is quite tempting simply to commit the change (after
updating the docs). One thing holding me back though: I
don't know what "the literature" refers to.
well, "the relevant publications on the topic" ...
and now define that (e.g. using the three 'References' on the
help page).
I count 5 actually... IIRC, the first two do not deal with glm diagnostics. The last two are by Fox, and, presumably, he is around to chime in if he wants. The middle one, by Williams, does define both standardized Pearson and standardized deviance residuals.

Or did you mean the three on ?glm.summaries? I would assume Davison and Snell to be the operative one, but I don't have it to hand.

Anyways, given that de default for residuals.glm is deviance residuals, I suppose that rstandard.glm should have the same default for consistency, and that is also the least disruptive variant. I see no reason not to make standardized Pearson residuals an option.
Really, that's what I think I meant when I (think I) wrote that FIXME.
The point then I think was that we had code "donations", and they
partly were clearly providing functionality that was (tested)
"correct" (according to e.g. McCoullagh & Nelder and probably
another one or two text books I would have consulted ... no
large Wikipedia back then), 
but also provided things for which there was nothing in "the
literature", but as the author provided them with other good
code, we would have put it in, as well....
== my vague recollection from the past

Martin

-- 
Peter Dalgaard Center for Statistics, Copenhagen Business
School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv:
PDalgd at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
Dear Peter and Martin,

On Thu, 17 Mar 2011 18:08:18 +0100
On Mar 17, 2011, at 16:14 , Martin Maechler wrote:

peter dalgaard <pdalgd at gmail.com>
   on Thu, 17 Mar 2011 15:45:01 +0100 writes:

Back to the original question:

The current rstandard() code reads
## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
   res <- infl$wt.res # = "dev.res"  really
   res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
   res[is.infinite(res)] <- NaN
   res
}

which is "svn blame" to ripley but that is due to the 2003
code reorganization (except for the infinity check from
2005). So apparently, we have had that FIXME since
forever... and finding its author appears to be awkward
(Maechler, perhaps?).
yes, almost surely

I did try Bretts code in lieu of the above (with a mod to
handle $dispersion) and even switched the default to use
the Pearson residuals. Make check-devel sailed straight
through apart from the obvious code/doc mismatch, so we
don't have any checks in place nor any examples using
rstandard(). I rather strongly suspect that there aren't
many user codes using it either.

It is quite tempting simply to commit the change (after
updating the docs). One thing holding me back though: I
don't know what "the literature" refers to.
well, "the relevant publications on the topic" ...
and now define that (e.g. using the three 'References' on the
help page).
I count 5 actually... IIRC, the first two do not deal with glm diagnostics. The last two are by Fox, and, presumably, he is around to chime in if he wants. The middle one, by Williams, does define both standardized Pearson and standardized deviance residuals.
Though I don't have it in front of me, I recall that my Applied Regression text follows Williams and defines both standardized deviance and standardized Pearson residuals. As well, there are new editions of both these sources: 

Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition (Sage)

Fox, J. and S. Weisberg (2011) An R Companion to Applied Regression, Second Edition (Sage)

I'd take Williams as the definitive reference. I'll send a follow-up message if my memory proves faulty.

Best,
 John
Or did you mean the three on ?glm.summaries? I would assume Davison and Snell to be the operative one, but I don't have it to hand.

Anyways, given that de default for residuals.glm is deviance residuals, I suppose that rstandard.glm should have the same default for consistency, and that is also the least disruptive variant. I see no reason not to make standardized Pearson residuals an option. 

Really, that's what I think I meant when I (think I) wrote that FIXME.
The point then I think was that we had code "donations", and they
partly were clearly providing functionality that was (tested)
"correct" (according to e.g. McCoullagh & Nelder and probably
another one or two text books I would have consulted ... no
large Wikipedia back then), 
but also provided things for which there was nothing in "the
literature", but as the author provided them with other good
code, we would have put it in, as well....
== my vague recollection from the past

Martin

-- 
Peter Dalgaard Center for Statistics, Copenhagen Business
School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv:
PDalgd at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
John Maindonald <john.maindonald at anu.edu.au> writes:
One can easily test for the binary case and not give the statistic in
that case.

A general point is that if one gave no output that was not open to
abuse, there'd be nothing given at all!
Thanks John.  I've been reluctant to push too hard for this on r-devel,
but this is more or less the point I made to Peter in a private email.
My example was the deviance, which is just as open to abuse as Pearson's
chi-square (except that differences of deviances for nested,
non-saturated models are usually ok), but which print.glm() and
summary.glm() proudly display at every opportunity.  If we were overly
concerned about preventing abuse, we might just provide the
(log-)likelihood so that users would at least have to take some
(trivial) action to compute the "dangerous" deviance statistic.

Obviously the question is one of balancing utility against the
likelihood of abuse, and apparently you and I would agree that the
utility of having Pearson's chi-square available as a matter of course
outweighs the dangers.  I don't think it would be too difficult to get
Peter to agree, but I suspect that there would likely be a strong
general reluctance to change glm() and/or summary.glm().  This could be
avoided by just providing a convenience function, as you suggest,
without necessarily changing any existing function.  This would put
Pearson's chi-square on a more nearly equal R-footing with the deviance
as a goodness-of-fit statistic, which I don't think anyone would argue
with, while still leaving the user with a bit to do to get a p-value,
say.

One of Peter's other posts suggests, perhaps unintentionally, a
functionality that would actually return the results of a
goodness-of-fit test, though possibly using only the deviance.  This
might indeed be useful, but I would still like to have an easy,
standardized way to just get Pearson's chi-square statistic.  Assuming a
situation in which either is appropriate, I tend to prefer Pearson's
chi-square over the deviance for testing goodness-of-fit, because I
think that in marginal cases its null distribution is usually better
approximated by the chi-square (if you think I'm wrong about this,
please let me know).  Pearson's chi-square also has the advantage of
being much easier to explain as a goodness-of-fit statistic than the
deviance, leaving everything else about the deviance to the realm of
likelihood-ratio tests, where it mostly belongs.  My original feature
request was, after all, mostly about having a simple, standardized way
in R for "naive users" (my students) to do some fairly standard things.
One would not be giving any output at all from poisson or binomial
models, given that data that really calls for quasi links (or a glmm
with observation level random effects) is in my experience the rule
rather than the exception!

At the very least, why not a function dispersion() or pearsonchisquare()
that gives this information.

Apologies that I misattributed this.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 16/03/2011, at 12:41 AM, peter dalgaard wrote:

On Mar 15, 2011, at 13:42 , John Maindonald wrote:

Peter Dalgaard: It would also be nice for teaching purposes if glm or summary.glm had a
"pearsonchisq" component and a corresponding extractor function, but I
can imagine that there might be arguments against it that haven't
occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
to repair a bug. If I'm wrong about all that though, ...

Umm, that was Brett, actually.

This would remedy what I have long judged a deficiency in summary.glm().
The information is important for diagnostic purposes.  One should not have
to fit a model with a quasi error, or suss out how to calculate the Pearson
chi square from the glm model object, to discover that the information in the
model object is inconsistent with simple binomial or poisson assumptions.
It could be somewhere between useless and misleading in cases like binary logistic regression though. (Same thing goes for the test against the saturated model: Sometimes it makes sense and sometimes not.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 15/03/2011, at 10:00 PM, r-devel-request at r-project.org wrote:

From: Brett Presnell <presnell at stat.ufl.edu>
Date: 15 March 2011 2:40:29 PM AEDT
To: peter dalgaard <pdalgd at gmail.com>
Cc: r-devel at r-project.org
Subject: Re: [Rd] Standardized Pearson residuals

Thanks Peter.  I have just a couple of minor comments, and another
possible feature request, although it's one that I don't think will be
implemented.

peter dalgaard <pdalgd at gmail.com> writes:

On Mar 14, 2011, at 22:25 , Brett Presnell wrote:

Is there any reason that rstandard.glm doesn't have a "pearson" option?
And if not, can it be added?
Probably... I have been wondering about that too. I'm even puzzled why
it isn't the default. Deviance residuals don't have quite the
properties that one might expect, e.g. in this situation, the absolute
residuals sum pairwise to zero, so you'd expect that the standardized
residuals be identical in absolute value

y <- 1:4
r <- c(0,0,1,1)
c <- c(0,1,0,1)
rstandard(glm(y~r+c,poisson))
      1          2          3          4 
-0.2901432  0.2767287  0.2784603 -0.2839995 

in comparison,

i <- influence(glm(y~r+c,poisson))
i$pear.res/sqrt(1-i$hat)
      1          2          3          4 
-0.2817181  0.2817181  0.2817181 -0.2817181 

The only thing is that I'm always wary of tampering with this stuff,
for fear of finding out the hard way why thing are the way they
are....
I'm sure that's wise, but it would be nice to get it in as an option,
even if it's not the default

Background: I'm currently teaching an undergrad/grad-service course from
Agresti's "Introduction to Categorical Data Analysis (2nd edn)" and
deviance residuals are not used in the text.  For now I'll just provide
the students with a simple function to use, but I prefer to use R's
native capabilities whenever possible.
Incidentally, chisq.test will have a stdres component in 2.13.0 for
much the same reason.
Thank you.  That's one more thing I won't have to provide code for
anymore.  Coincidentally, Agresti mentioned this to me a week or two ago
as something that he felt was missing, so that's at least two people who
will be happy to see this added.

It would also be nice for teaching purposes if glm or summary.glm had a
"pearsonchisq" component and a corresponding extractor function, but I
can imagine that there might be arguments against it that haven't
occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
to repair a bug. If I'm wrong about all that though, ...

BTW, as I go along I'm trying to collect a lot of the datasets from the
examples and exercises in the text into an R package ("icda").  It's far
from complete and what is there needed tidying up, but I hope to
eventually to round it into shape and put it on CRAN, assuming that
Agresti approves and that there are no copyright issues.

I think something along the following lines should do it:

rstandard.glm <-
function(model,
       infl=influence(model, do.coef=FALSE),
       type=c("deviance", "pearson"), ...)
{
type <- match.arg(type)
res <- switch(type, pearson = infl$pear.res, infl$dev.res)
res <- res/sqrt(1-infl$hat)
res[is.infinite(res)] <- NaN
res
}

	[[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

John Maindonald <john.maindonald at anu.edu.au> writes:

One can easily test for the binary case and not give the statistic in
that case.

A general point is that if one gave no output that was not open to
abuse, there'd be nothing given at all!
Thanks John.  I've been reluctant to push too hard for this on r-devel,
but this is more or less the point I made to Peter in a private email.
Well, r-devel is the discussion club. If you push people there, they can move away.... 

Anyways, I have committed the new rstandard(); to r-devel for now, if nothing falls on its face, we can move it to R 2.13.0 alpha before it goes to beta on March 30.

	-p
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
Thanks for putting in the rstandard() change Peter.  I'll keep my
fingers crossed that it doesn't break anything.

Meanwhile, I hope that you and all the core developers will take my
enormous appreciation for all that you do as implicit in any message
that I send.  You have changed and continue to change the world in a
very positive way.