Skip to content

residual standard error in rlm (MASS package)

2 messages · Kari Ruohonen, Brian Ripley

#
Hi,
I would appreciate of someone could explain how the residual standard
error is computed for rlm models (MASS package). Usually, one would
expect to get the residual standard error by
where y is the response, fm a linear model with an intercept and slope
for x and n the number of observations. This does not seem to work for
rlm models and I am wondering what obvious am I missing here? Here is an
example:

x<-1:100
y <-
c(2.37156056743079, 1.66644749462933, 6.33155723966817,
12.7709430358167, 
11.6666124950273, 19.7839679181322, 15.4923741347280, 18.702397068068, 
18.7599963836891, 16.5916430986993, 16.0653054434192, 25.4517287910774, 
19.9306544701024, 25.3581170063305, 35.6823980984208, 25.8293557856092, 
34.7021243077337, 31.5336533511445, 36.3599764020412, 44.6000402205419, 
41.9899219097128, 45.4564141342995, 43.6061038794823, 48.7566542867736, 
47.5504015095432, 54.8120780105412, 55.2620894365424, 53.223516997263, 
59.5477081631011, 61.2390445046623, 62.3106323086734, 68.1104058608567, 
62.399184797047, 73.9413640517595, 70.6710955288097, 74.5456476513766, 
64.968260562374, 73.2318014155102, 73.7335636549196, 76.9362454490887, 
80.2579421621043, 80.945827481932, 87.7805234941603, 90.0909966936097, 
86.0620664696943, 90.3640690887434, 98.0965832886435, 96.789139334781, 
102.114606626867, 98.3302535449148, 103.107825932103, 109.942412367491, 
106.868253017023, 109.808738425258, 110.136050155862, 108.846488332796, 
118.442973085485, 117.276921857816, 118.640871017018, 119.263784892266, 
123.100214564588, 123.860590728955, 128.712228721465, 131.297848895423, 
123.283516322512, 134.012585073241, 132.665302554315, 138.673423711638, 
143.687124396642, 139.159598404340, 142.012045172451, 146.480644634549, 
145.429104228138, 144.503524323636, 152.348091257061, 149.237135977337, 
159.803973361884, 153.195835890301, 158.921034703569, 163.479578254736, 
159.591944778941, 163.185119145309, 165.890510577093, 164.573471319534, 
173.549321320816, 169.520130741843, 170.439532597426, 174.477604263110, 
178.059609946662, 177.828073866105, 185.005760822296, 184.280998437732, 
196.085419590290, 187.125508176825, 190.524627542992, 196.849299652848, 
197.830377226055, 197.973198490102, 198.59328678419, 199.450725602621
)
# y originally generated with y<-2*x+rnorm(100,0,2)

fm<-lm(y~x)
rm<-rlm(y~x)
fm.r<-sqrt(sum((y-fitted(fm))^2)/(n-2))
rm.r<-sqrt(sum((y-fitted(rm))^2)/(n-2))
print(matrix(c(fm.r,summary(fm)$sigma,rm.r,summary(rm)$sigma),
             ncol=2,byrow=T))

Output of this is:
         [,1]     [,2]
[1,] 1.900033 1.900033
[2,] 1.905847 1.595128

I.e. for the lm model the residual standard error from the summary.lm
method matches exactly sqrt(sum((y-fitted(fm))^2)/(n-2)) but that for
the summary.rlm model is somewhat smaller than
sqrt(sum((y-fitted(rm))^2)/(n-2)). I am curious what causes this
difference?

My sessionInfo()
R version 2.7.1 (2008-06-23) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base     

other attached packages:
[1] MASS_7.2-44

regards, Kari Ruohonen
#
The 'r' in rlm is for 'robust', so it does not compute a residual sum of 
squares (which is not robust), but rather a robust estimate of the scale.
That *is* what the help page ?summary.rlm says:

    sigma: The scale estimate.

   stddev: A scale estimate used for the standard errors.

As to exactly what those scale esimates are, see the references or the 
code, but e.g. stddev is based on downweighting larger residuals.
On Mon, 8 Dec 2008, Kari Ruohonen wrote: