Skip to content

extend summary.lm for hccm?

11 messages · Dirk Eddelbuettel, John Fox, Frank E Harrell Jr +3 more

#
dear R experts:

I wonder whether it is possible to extend the summary method for the
lm function, so that it uses an option "hccm" (well, model "hc0").  In
my line of work, it is pretty much required in reporting of almost all
linear regressions these days, which means that it would be very nice
not to have to manually library car, then sqrt the diagonal, and
recompute T-stats; instead, I would love to get everything in the same
format as the current output---except errors heteroskedasticity
adjusted.

easy or hard?

regards,

/iaw
#
On 23 December 2006 at 20:46, ivo welch wrote:
| I wonder whether it is possible to extend the summary method for the
| lm function, so that it uses an option "hccm" (well, model "hc0").  In
| my line of work, it is pretty much required in reporting of almost all
| linear regressions these days, which means that it would be very nice
| not to have to manually library car, then sqrt the diagonal, and
| recompute T-stats; instead, I would love to get everything in the same
| format as the current output---except errors heteroskedasticity
| adjusted.
| 
| easy or hard?

Did you consider the 'sandwich' package? A simple

	> install.packages("sandwich")
	> library(sandwich)
	> ?vcovHC
	> ?vcovHAC
	
should get you there.

Dirk
#
Dear Dirk and Ivo,

It's true that the sandwich package has more extensive facilities for
sandwich variance estimation than the hccm function in the car package, but
I think that the thrust of Ivo's question is to get a model summary that
automatically uses the adjusted standard errors rather than having to
compute them and use them "manually."  Here's one approach, which could be
modified slightly if Ivo wants "hc0" as the default; it could also be
adapted to use the sandwich package.

I hope this helps,
 John

----------- snip --------------

summaryHCCM <- function(model, ...) UseMethod("summaryHCCM")

summaryHCCM.lm <- function(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4"),

    ...){
    if (!require(car)) stop("Required car package is missing.")
    type <- match.arg(type)
    V <- hccm(model, type=type)
    sumry <- summary(model)
    table <- coef(sumry)
    table[,2] <- sqrt(diag(V))
    table[,3] <- table[,1]/table[,2]
    table[,4] <- 2*pt(abs(table[,3]), df.residual(model), lower.tail=FALSE)
    sumry$coefficients <- table
    print(sumry)
    cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
        type, "\n")
    }    

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
John Fox wrote:
Another approach:

library(Design)  # also requires library(Hmisc)
f <- ols(y ~ x1 + x2, x=TRUE, y=TRUE)
f <- robcov(f)   # sandwich; also allows clustering.  Also see bootcov
anova(f)         # all later steps use sandwich variance matrix
summmary(f)
contrast(f, list(x1=.5), list(x1=.2))

BUT note that sandwich covariance matrix estimators can have poor mean 
squared error (a paper by Bill Gould in Stata Technical Bulletin years 
ago related to logistic regression showed an example with a 100-fold 
increase in the variance of a variance estimate) and can give you the 
right estimate of the wrong quantity (reference below).

Frank Harrell

@Article{free06so,
   author =               {Freedman, David A.},
   title =                {On the so-called ``{Huber} sandwich 
estimator'' and
``robust standard errors''},
   journal =      The American Statistician,
   year =                 2006,
   volume =               60,
   pages =                {299-302},
   annote =               {nice summary of derivation of sandwich
estimators;questions why we should be interested in getting the right
variance of the wrong parameters when the model doesn't fit}
}

  
    
#
Dear Frank,

If I remember Freedman's recent paper correctly, he argues that sandwich
variance estimator, though problematic in general, is not problematic in the
case that White described -- an otherwise correctly specified linear model
with heteroscedasticity estimated by least-squares. 

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
John Fox wrote:
That's right John although the precision of the variance estimators is 
still worth studying in that case.

Best regards,
Frank
#
thank you, everybody.  It's hard to find much fault with R, but one
feature that would be nice to have would be hooks to output functions
that make it easy to augment to the output---e.g., to add a new
statistic to summary (e.g., mean, sd, different stderrs).  not a big
flaw, because one can write replacement functions...

 /ivo
#
On Sun, 24 Dec 2006, John Fox wrote:

            
I've written the function coeftest() in package "lmtest" for this purpose.
With this you can
  coeftest(obj, vcov = sandwich)
or you can put this into a specialized summary() function as John
suggested (where you probably would want the F statistic to use the other
vcov as well). See also function waldtest() in "lmtest",
linear.hypothesis() in "car" and
  vignette("sandwich", package = "sandwich")

Although this works, it is still a nuisance to use a different function
and not summary() directly. In addition, it would also be nice to plug in
different vcovs into confint() or predict() methods. Of course, one could
write different generics or overload the methods in certain packages.
However, I guess that many practitioners want to use different vcov
estimators - especially in the social and political scieneces, and
econometrics etc. - so that this might justify that the original "lm" (and
"glm") methods are extended to allow for plugging in different vcov
matrices. Maybe we could try to convince R-core to include somthing like
this?

Best wishes,
Z
#
On Sun, 24 Dec 2006, John Fox wrote:

            
More generally, sandwich-type estimators are valid (i.e., estimate
the right quantity, although not necessarily precisely, as Frank pointed
out) in situations where the estimating functions are correctly specified
but remaining aspets of the likelihood (not captured in the estimating
functions) are potentially not.

In linear models, it is easy to see what this means: the mean function has
to be correctly specified (i.e., the errors have zero mean) but the
correlation structure of the errors (i.e., their (co-)variances) might
differ from the usual assumptions. In GLMs, in particular logistic
regression, it is much more difficult to see against which types of
misspecification sandwich-based inference is robust.

Freedman's paper stresses the point that many model misspecifications also
imply misspecified estimating functions (and in his example this is rather
obvious) so that consequently the sandwich-type estimators estimate the
wrong quantity.

Best wishes,
Z
#
Dear Achim,
Since coeftest() already exists in a package, I think that this is a
preferable solution.
Good point. Here's a modified version that also fixes up the F-test:

summaryHCCM.lm <- function(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4"),

    ...){
    if (!require(car)) stop("Required car package is missing.")
    type <- match.arg(type)
    V <- hccm(model, type=type)
    sumry <- summary(model)
    table <- coef(sumry)
    table[,2] <- sqrt(diag(V))
    table[,3] <- table[,1]/table[,2]
    table[,4] <- 2*pt(abs(table[,3]), df.residual(model), lower.tail=FALSE)
    sumry$coefficients <- table
    p <- nrow(table)
    hyp <- if (has.intercept(model)) cbind(0, diag(p - 1)) else diag(p)
    sumry$fstatistic[1] <- linear.hypothesis(model, hyp,
white.adjust=type)[2,"F"]
    print(sumry)
    cat("Note: Heteroscedasticity-consistant standard errors using
adjustment",
        type, "\n")
    }

Regards,
 John
1 day later
#
On Mon, 25 Dec 2006, Achim Zeileis wrote:

            
Not exactly. The asymptotic properites are good, but in samples of 
moderate size the properties (including both biasedness and variance) can 
be surprisingly bad. And if you are trying to calculate a p-value, getting 
a too-small variance estimate gives you spuriously small p-values. FWIW, 
I've seen a case in which the nominal size of a test based on the sandwich 
estimator was several orders of magnitude smaller than a test with correct 
nominal size.

There is a modest literature on this. Some refs:

Background Papers:

Drum M, McCullagh P. Comment. Statistical Science 1993; 8:300-301.

Freedman DA. On the So-Called "Huber Sandwich Estimator" and "Robust"
Standard Errors. The American Statistician, Volume 60, Number 4, 
November 2006, pp. 299-302(4)

-----

Some Proposed Corrections:

Fay MP and Graubard BI. Small-Sample Adjustments for
Walt-Type Tests Using Sandwich Estimators. Biometrics 2001; 57:
1198-1206.

Guo X, Pan W, Connett JE, Hannan PJ and French SA.  Small-sample
performance of the robust score test and its modifications in
generalized estimating equations Statistics in Medicine 2005;
24:3479-3495

Mancl LA and DeRouen TA, A Covariance Estimator for GEE with Improved
Small-Sample Properties. Biometrics 2001; 57:126-134.

Morel JG, Bokossa MC, and Neerchal NK.  Small Sample Correction for
the Variance of GEE Estimators Biometrical Journal 2003; 45(4):
395-409.

Pan W and Wall MM. Small-sample adjustments in using the
sandwich variance estimator in generalized estimating
equations. Statistics in Medicine  2002; 21:1429-1441.


HTH,

Chuck
Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717