Skip to content
Prev 301584 / 398503 Next

puzzling classical Mahalanobis distances from covMcd() {robustbase}

The values should probably be labeled "initial" instead of "raw" which is
how they are labeled in the source. The Details section of manual indicates
that the first step is to identify a subset of the original data between .5
and 1 whose covariance matrix has the lowest possible determinant. The next
paragraph:

"The raw MCD estimate of location is then the average of these h points,
whereas the raw MCD estimate of scatter is their covariance matrix,
multiplied by a consistency factor and a finite sample correction factor (to
make it consistent at the normal model and unbiased at small samples)."

Following your example:
[1] 0 1 1 1 1 1 0 1 0 1  <== Note that the raw weights eliminate obs 1, 7,
and 9
[1]  0.5472968 -0.1634567 -0.1780795        <== Compare original means 
[1]  0.08172336 -0.03067387 -0.23956925         and "raw" means
eliminated
[1]  0.08172336 -0.03067387 -0.23956925      <== This matches
D2rb$raw.center

So the "raw" values are taken for a subset, h, which includes observations
2, 3, 4, 5, 6, 8, and 10. Given that the raw.center and raw.cov are based on
a subset of the original data, the mahalanobis distances will not be the
same either.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352