dear R experts: I wonder whether it is possible to extend the summary method for the lm function, so that it uses an option "hccm" (well, model "hc0"). In my line of work, it is pretty much required in reporting of almost all linear regressions these days, which means that it would be very nice not to have to manually library car, then sqrt the diagonal, and recompute T-stats; instead, I would love to get everything in the same format as the current output---except errors heteroskedasticity adjusted. easy or hard? regards, /iaw
extend summary.lm for hccm?
11 messages · Dirk Eddelbuettel, John Fox, Frank E Harrell Jr +3 more
On 23 December 2006 at 20:46, ivo welch wrote:
| I wonder whether it is possible to extend the summary method for the
| lm function, so that it uses an option "hccm" (well, model "hc0"). In
| my line of work, it is pretty much required in reporting of almost all
| linear regressions these days, which means that it would be very nice
| not to have to manually library car, then sqrt the diagonal, and
| recompute T-stats; instead, I would love to get everything in the same
| format as the current output---except errors heteroskedasticity
| adjusted.
|
| easy or hard?
Did you consider the 'sandwich' package? A simple
> install.packages("sandwich")
> library(sandwich)
> ?vcovHC
> ?vcovHAC
should get you there.
Dirk
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison
Dear Dirk and Ivo,
It's true that the sandwich package has more extensive facilities for
sandwich variance estimation than the hccm function in the car package, but
I think that the thrust of Ivo's question is to get a model summary that
automatically uses the adjusted standard errors rather than having to
compute them and use them "manually." Here's one approach, which could be
modified slightly if Ivo wants "hc0" as the default; it could also be
adapted to use the sandwich package.
I hope this helps,
John
----------- snip --------------
summaryHCCM <- function(model, ...) UseMethod("summaryHCCM")
summaryHCCM.lm <- function(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4"),
...){
if (!require(car)) stop("Required car package is missing.")
type <- match.arg(type)
V <- hccm(model, type=type)
sumry <- summary(model)
table <- coef(sumry)
table[,2] <- sqrt(diag(V))
table[,3] <- table[,1]/table[,2]
table[,4] <- 2*pt(abs(table[,3]), df.residual(model), lower.tail=FALSE)
sumry$coefficients <- table
print(sumry)
cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
type, "\n")
}
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dirk Eddelbuettel Sent: Saturday, December 23, 2006 11:16 PM To: ivo welch Cc: r-help at stat.math.ethz.ch Subject: Re: [R] extend summary.lm for hccm? On 23 December 2006 at 20:46, ivo welch wrote: | I wonder whether it is possible to extend the summary method for the | lm function, so that it uses an option "hccm" (well, model "hc0"). In | my line of work, it is pretty much required in reporting of almost all | linear regressions these days, which means that it would be very nice | not to have to manually library car, then sqrt the diagonal, and | recompute T-stats; instead, I would love to get everything in the same | format as the current output---except errors heteroskedasticity | adjusted. | | easy or hard? Did you consider the 'sandwich' package? A simple
> install.packages("sandwich")
> library(sandwich)
> ?vcovHC
> ?vcovHAC
should get you there.
Dirk
--
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
John Fox wrote:
Dear Dirk and Ivo, It's true that the sandwich package has more extensive facilities for sandwich variance estimation than the hccm function in the car package, but I think that the thrust of Ivo's question is to get a model summary that automatically uses the adjusted standard errors rather than having to compute them and use them "manually." Here's one approach, which could be modified slightly if Ivo wants "hc0" as the default; it could also be adapted to use the sandwich package. I hope this helps, John
Another approach:
library(Design) # also requires library(Hmisc)
f <- ols(y ~ x1 + x2, x=TRUE, y=TRUE)
f <- robcov(f) # sandwich; also allows clustering. Also see bootcov
anova(f) # all later steps use sandwich variance matrix
summmary(f)
contrast(f, list(x1=.5), list(x1=.2))
BUT note that sandwich covariance matrix estimators can have poor mean
squared error (a paper by Bill Gould in Stata Technical Bulletin years
ago related to logistic regression showed an example with a 100-fold
increase in the variance of a variance estimate) and can give you the
right estimate of the wrong quantity (reference below).
Frank Harrell
@Article{free06so,
author = {Freedman, David A.},
title = {On the so-called ``{Huber} sandwich
estimator'' and
``robust standard errors''},
journal = The American Statistician,
year = 2006,
volume = 60,
pages = {299-302},
annote = {nice summary of derivation of sandwich
estimators;questions why we should be interested in getting the right
variance of the wrong parameters when the model doesn't fit}
}
----------- snip --------------
summaryHCCM <- function(model, ...) UseMethod("summaryHCCM")
summaryHCCM.lm <- function(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4"),
...){
if (!require(car)) stop("Required car package is missing.")
type <- match.arg(type)
V <- hccm(model, type=type)
sumry <- summary(model)
table <- coef(sumry)
table[,2] <- sqrt(diag(V))
table[,3] <- table[,1]/table[,2]
table[,4] <- 2*pt(abs(table[,3]), df.residual(model), lower.tail=FALSE)
sumry$coefficients <- table
print(sumry)
cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
type, "\n")
}
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dirk Eddelbuettel Sent: Saturday, December 23, 2006 11:16 PM To: ivo welch Cc: r-help at stat.math.ethz.ch Subject: Re: [R] extend summary.lm for hccm? On 23 December 2006 at 20:46, ivo welch wrote: | I wonder whether it is possible to extend the summary method for the | lm function, so that it uses an option "hccm" (well, model "hc0"). In | my line of work, it is pretty much required in reporting of almost all | linear regressions these days, which means that it would be very nice | not to have to manually library car, then sqrt the diagonal, and | recompute T-stats; instead, I would love to get everything in the same | format as the current output---except errors heteroskedasticity | adjusted. | | easy or hard? Did you consider the 'sandwich' package? A simple
> install.packages("sandwich")
> library(sandwich)
> ?vcovHC
> ?vcovHAC
should get you there. Dirk --
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Dear Frank, If I remember Freedman's recent paper correctly, he argues that sandwich variance estimator, though problematic in general, is not problematic in the case that White described -- an otherwise correctly specified linear model with heteroscedasticity estimated by least-squares. Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Frank E Harrell Jr Sent: Sunday, December 24, 2006 10:29 AM To: John Fox Cc: r-help at stat.math.ethz.ch; 'ivo welch'; 'Dirk Eddelbuettel' Subject: Re: [R] extend summary.lm for hccm? John Fox wrote:
Dear Dirk and Ivo, It's true that the sandwich package has more extensive
facilities for
sandwich variance estimation than the hccm function in the car package, but I think that the thrust of Ivo's question is to get a model summary that automatically uses the adjusted standard errors rather than having to compute them and use them "manually." Here's one approach, which could be modified slightly if Ivo wants
"hc0" as
the default; it could also be adapted to use the sandwich package. I hope this helps, John
Another approach:
library(Design) # also requires library(Hmisc) f <- ols(y ~
x1 + x2, x=TRUE, y=TRUE)
f <- robcov(f) # sandwich; also allows clustering. Also see bootcov
anova(f) # all later steps use sandwich variance matrix
summmary(f)
contrast(f, list(x1=.5), list(x1=.2))
BUT note that sandwich covariance matrix estimators can have
poor mean squared error (a paper by Bill Gould in Stata
Technical Bulletin years ago related to logistic regression
showed an example with a 100-fold increase in the variance of
a variance estimate) and can give you the right estimate of
the wrong quantity (reference below).
Frank Harrell
@Article{free06so,
author = {Freedman, David A.},
title = {On the so-called ``{Huber} sandwich
estimator'' and
``robust standard errors''},
journal = The American Statistician,
year = 2006,
volume = 60,
pages = {299-302},
annote = {nice summary of derivation of sandwich
estimators;questions why we should be interested in getting
the right variance of the wrong parameters when the model
doesn't fit} }
----------- snip --------------
summaryHCCM <- function(model, ...) UseMethod("summaryHCCM")
summaryHCCM.lm <- function(model, type=c("hc3", "hc0",
"hc1", "hc2",
"hc4"),
...){
if (!require(car)) stop("Required car package is missing.")
type <- match.arg(type)
V <- hccm(model, type=type)
sumry <- summary(model)
table <- coef(sumry)
table[,2] <- sqrt(diag(V))
table[,3] <- table[,1]/table[,2]
table[,4] <- 2*pt(abs(table[,3]), df.residual(model),
lower.tail=FALSE)
sumry$coefficients <- table
print(sumry)
cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
type, "\n")
}
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dirk Eddelbuettel Sent: Saturday, December 23, 2006 11:16 PM To: ivo welch Cc: r-help at stat.math.ethz.ch Subject: Re: [R] extend summary.lm for hccm? On 23 December 2006 at 20:46, ivo welch wrote: | I wonder whether it is possible to extend the summary method for the | lm function, so that it uses an option "hccm" (well, model "hc0"). In | my line of work, it is pretty much required in reporting of almost all | linear regressions these days, which means that it would be very nice | not to have to manually library car, then sqrt the diagonal, and | recompute T-stats; instead, I would love to get everything in the same | format as the current output---except errors heteroskedasticity | adjusted. | | easy or hard? Did you consider the 'sandwich' package? A simple
> install.packages("sandwich")
> library(sandwich)
> ?vcovHC
> ?vcovHAC
should get you there. Dirk --
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics
Vanderbilt University
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
John Fox wrote:
Dear Frank, If I remember Freedman's recent paper correctly, he argues that sandwich variance estimator, though problematic in general, is not problematic in the case that White described -- an otherwise correctly specified linear model with heteroscedasticity estimated by least-squares. Regards, John
That's right John although the precision of the variance estimators is still worth studying in that case. Best regards, Frank
-------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Frank E Harrell Jr Sent: Sunday, December 24, 2006 10:29 AM To: John Fox Cc: r-help at stat.math.ethz.ch; 'ivo welch'; 'Dirk Eddelbuettel' Subject: Re: [R] extend summary.lm for hccm? John Fox wrote:
Dear Dirk and Ivo, It's true that the sandwich package has more extensive
facilities for
sandwich variance estimation than the hccm function in the car package, but I think that the thrust of Ivo's question is to get a model summary that automatically uses the adjusted standard errors rather than having to compute them and use them "manually." Here's one approach, which could be modified slightly if Ivo wants
"hc0" as
the default; it could also be adapted to use the sandwich package. I hope this helps, John
Another approach:
library(Design) # also requires library(Hmisc) f <- ols(y ~
x1 + x2, x=TRUE, y=TRUE)
f <- robcov(f) # sandwich; also allows clustering. Also see bootcov
anova(f) # all later steps use sandwich variance matrix
summmary(f)
contrast(f, list(x1=.5), list(x1=.2))
BUT note that sandwich covariance matrix estimators can have
poor mean squared error (a paper by Bill Gould in Stata
Technical Bulletin years ago related to logistic regression
showed an example with a 100-fold increase in the variance of
a variance estimate) and can give you the right estimate of
the wrong quantity (reference below).
Frank Harrell
@Article{free06so,
author = {Freedman, David A.},
title = {On the so-called ``{Huber} sandwich
estimator'' and
``robust standard errors''},
journal = The American Statistician,
year = 2006,
volume = 60,
pages = {299-302},
annote = {nice summary of derivation of sandwich
estimators;questions why we should be interested in getting
the right variance of the wrong parameters when the model
doesn't fit} }
----------- snip --------------
summaryHCCM <- function(model, ...) UseMethod("summaryHCCM")
summaryHCCM.lm <- function(model, type=c("hc3", "hc0",
"hc1", "hc2",
"hc4"),
...){
if (!require(car)) stop("Required car package is missing.")
type <- match.arg(type)
V <- hccm(model, type=type)
sumry <- summary(model)
table <- coef(sumry)
table[,2] <- sqrt(diag(V))
table[,3] <- table[,1]/table[,2]
table[,4] <- 2*pt(abs(table[,3]), df.residual(model),
lower.tail=FALSE)
sumry$coefficients <- table
print(sumry)
cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
type, "\n")
}
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dirk Eddelbuettel Sent: Saturday, December 23, 2006 11:16 PM To: ivo welch Cc: r-help at stat.math.ethz.ch Subject: Re: [R] extend summary.lm for hccm? On 23 December 2006 at 20:46, ivo welch wrote: | I wonder whether it is possible to extend the summary method for the | lm function, so that it uses an option "hccm" (well, model "hc0"). In | my line of work, it is pretty much required in reporting of almost all | linear regressions these days, which means that it would be very nice | not to have to manually library car, then sqrt the diagonal, and | recompute T-stats; instead, I would love to get everything in the same | format as the current output---except errors heteroskedasticity | adjusted. | | easy or hard? Did you consider the 'sandwich' package? A simple
> install.packages("sandwich")
> library(sandwich)
> ?vcovHC
> ?vcovHAC
should get you there. Dirk
thank you, everybody. It's hard to find much fault with R, but one feature that would be nice to have would be hooks to output functions that make it easy to augment to the output---e.g., to add a new statistic to summary (e.g., mean, sd, different stderrs). not a big flaw, because one can write replacement functions... /ivo
On Sun, 24 Dec 2006, John Fox wrote:
Dear Dirk and Ivo, It's true that the sandwich package has more extensive facilities for sandwich variance estimation than the hccm function in the car package, but I think that the thrust of Ivo's question is to get a model summary that automatically uses the adjusted standard errors rather than having to compute them and use them "manually."
I've written the function coeftest() in package "lmtest" for this purpose.
With this you can
coeftest(obj, vcov = sandwich)
or you can put this into a specialized summary() function as John
suggested (where you probably would want the F statistic to use the other
vcov as well). See also function waldtest() in "lmtest",
linear.hypothesis() in "car" and
vignette("sandwich", package = "sandwich")
Although this works, it is still a nuisance to use a different function
and not summary() directly. In addition, it would also be nice to plug in
different vcovs into confint() or predict() methods. Of course, one could
write different generics or overload the methods in certain packages.
However, I guess that many practitioners want to use different vcov
estimators - especially in the social and political scieneces, and
econometrics etc. - so that this might justify that the original "lm" (and
"glm") methods are extended to allow for plugging in different vcov
matrices. Maybe we could try to convince R-core to include somthing like
this?
Best wishes,
Z
On Sun, 24 Dec 2006, John Fox wrote:
If I remember Freedman's recent paper correctly, he argues that sandwich variance estimator, though problematic in general, is not problematic in the case that White described -- an otherwise correctly specified linear model with heteroscedasticity estimated by least-squares.
More generally, sandwich-type estimators are valid (i.e., estimate the right quantity, although not necessarily precisely, as Frank pointed out) in situations where the estimating functions are correctly specified but remaining aspets of the likelihood (not captured in the estimating functions) are potentially not. In linear models, it is easy to see what this means: the mean function has to be correctly specified (i.e., the errors have zero mean) but the correlation structure of the errors (i.e., their (co-)variances) might differ from the usual assumptions. In GLMs, in particular logistic regression, it is much more difficult to see against which types of misspecification sandwich-based inference is robust. Freedman's paper stresses the point that many model misspecifications also imply misspecified estimating functions (and in his example this is rather obvious) so that consequently the sandwich-type estimators estimate the wrong quantity. Best wishes, Z
Dear Achim,
-----Original Message----- From: Achim Zeileis [mailto:Achim.Zeileis at wu-wien.ac.at] Sent: Monday, December 25, 2006 8:38 AM To: John Fox Cc: 'Dirk Eddelbuettel'; 'ivo welch'; r-help at stat.math.ethz.ch Subject: Re: [R] extend summary.lm for hccm? On Sun, 24 Dec 2006, John Fox wrote:
Dear Dirk and Ivo, It's true that the sandwich package has more extensive
facilities for
sandwich variance estimation than the hccm function in the car package, but I think that the thrust of Ivo's question is to get a model summary that automatically uses the adjusted standard errors rather than having to compute them and use them "manually."
I've written the function coeftest() in package "lmtest" for this purpose. With this you can coeftest(obj, vcov = sandwich)
Since coeftest() already exists in a package, I think that this is a preferable solution.
or you can put this into a specialized summary() function as John suggested (where you probably would want the F statistic to use the other vcov as well).
Good point. Here's a modified version that also fixes up the F-test:
summaryHCCM.lm <- function(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4"),
...){
if (!require(car)) stop("Required car package is missing.")
type <- match.arg(type)
V <- hccm(model, type=type)
sumry <- summary(model)
table <- coef(sumry)
table[,2] <- sqrt(diag(V))
table[,3] <- table[,1]/table[,2]
table[,4] <- 2*pt(abs(table[,3]), df.residual(model), lower.tail=FALSE)
sumry$coefficients <- table
p <- nrow(table)
hyp <- if (has.intercept(model)) cbind(0, diag(p - 1)) else diag(p)
sumry$fstatistic[1] <- linear.hypothesis(model, hyp,
white.adjust=type)[2,"F"]
print(sumry)
cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
type, "\n")
}
Regards,
John
See also function waldtest()
in "lmtest",
linear.hypothesis() in "car" and
vignette("sandwich", package = "sandwich")
Although this works, it is still a nuisance to use a
different function and not summary() directly. In addition,
it would also be nice to plug in different vcovs into
confint() or predict() methods. Of course, one could write
different generics or overload the methods in certain packages.
However, I guess that many practitioners want to use
different vcov estimators - especially in the social and
political scieneces, and econometrics etc. - so that this
might justify that the original "lm" (and
"glm") methods are extended to allow for plugging in
different vcov matrices. Maybe we could try to convince
R-core to include somthing like this?
Best wishes,
Z
1 day later
On Mon, 25 Dec 2006, Achim Zeileis wrote:
On Sun, 24 Dec 2006, John Fox wrote:
If I remember Freedman's recent paper correctly, he argues that sandwich variance estimator, though problematic in general, is not problematic in the case that White described -- an otherwise correctly specified linear model with heteroscedasticity estimated by least-squares.
More generally, sandwich-type estimators are valid (i.e., estimate the right quantity, although not necessarily precisely, as Frank pointed out) in situations where the estimating functions are correctly specified but remaining aspets of the likelihood (not captured in the estimating functions) are potentially not.
Not exactly. The asymptotic properites are good, but in samples of moderate size the properties (including both biasedness and variance) can be surprisingly bad. And if you are trying to calculate a p-value, getting a too-small variance estimate gives you spuriously small p-values. FWIW, I've seen a case in which the nominal size of a test based on the sandwich estimator was several orders of magnitude smaller than a test with correct nominal size. There is a modest literature on this. Some refs: Background Papers: Drum M, McCullagh P. Comment. Statistical Science 1993; 8:300-301. Freedman DA. On the So-Called "Huber Sandwich Estimator" and "Robust" Standard Errors. The American Statistician, Volume 60, Number 4, November 2006, pp. 299-302(4) ----- Some Proposed Corrections: Fay MP and Graubard BI. Small-Sample Adjustments for Walt-Type Tests Using Sandwich Estimators. Biometrics 2001; 57: 1198-1206. Guo X, Pan W, Connett JE, Hannan PJ and French SA. Small-sample performance of the robust score test and its modifications in generalized estimating equations Statistics in Medicine 2005; 24:3479-3495 Mancl LA and DeRouen TA, A Covariance Estimator for GEE with Improved Small-Sample Properties. Biometrics 2001; 57:126-134. Morel JG, Bokossa MC, and Neerchal NK. Small Sample Correction for the Variance of GEE Estimators Biometrical Journal 2003; 45(4): 395-409. Pan W and Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Statistics in Medicine 2002; 21:1429-1441. HTH, Chuck
In linear models, it is easy to see what this means: the mean function has to be correctly specified (i.e., the errors have zero mean) but the correlation structure of the errors (i.e., their (co-)variances) might differ from the usual assumptions. In GLMs, in particular logistic regression, it is much more difficult to see against which types of misspecification sandwich-based inference is robust. Freedman's paper stresses the point that many model misspecifications also imply misspecified estimating functions (and in his example this is rather obvious) so that consequently the sandwich-type estimators estimate the wrong quantity. Best wishes, Z
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717