puzzling classical Mahalanobis distances from covMcd() {robustbase}
The values should probably be labeled "initial" instead of "raw" which is how they are labeled in the source. The Details section of manual indicates that the first step is to identify a subset of the original data between .5 and 1 whose covariance matrix has the lowest possible determinant. The next paragraph: "The raw MCD estimate of location is then the average of these h points, whereas the raw MCD estimate of scatter is their covariance matrix, multiplied by a consistency factor and a finite sample correction factor (to make it consistent at the normal model and unbiased at small samples)." Following your example:
set.seed(42) x <- matrix(rnorm(10*3), ncol = 3) xmeans <- colMeans(x) Sx <- cov(x) D2rb <- covMcd(x) D2rb$raw.weights
[1] 0 1 1 1 1 1 0 1 0 1 <== Note that the raw weights eliminate obs 1, 7, and 9
xmeans; D2rb$raw.center
[1] 0.5472968 -0.1634567 -0.1780795 <== Compare original means [1] 0.08172336 -0.03067387 -0.23956925 and "raw" means
colMeans(x[as.logical(D2rb$raw.weights),]) <== means with 1, 7, and 9
eliminated [1] 0.08172336 -0.03067387 -0.23956925 <== This matches D2rb$raw.center So the "raw" values are taken for a subset, h, which includes observations 2, 3, 4, 5, 6, 8, and 10. Given that the raw.center and raw.cov are based on a subset of the original data, the mahalanobis distances will not be the same either. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Fraser D. Neiman
Sent: Friday, July 27, 2012 7:16 AM
To: r-help at r-project.org
Subject: [R] puzzling classical Mahalanobis distances from covMcd()
{robustbase}
Greetings,
I am puzzled about why the _classical_ Mahalanobis distances that I get
using
the {stats} mahalanobis() function do not match the distances I get
from the
{robustbase} covMcd() function. Here is an example:
x <- matrix(rnorm(10*3), ncol = 3)
#here is the {stats} result:
Sx <- cov(x)
D2 <- mahalanobis(x, colMeans(x), Sx)
D2
[1] 1.5135795 1.3761046 1.0367444 1.8111585 4.3038621 5.3195918
3.2798665
5.7559301
[9] 2.2172150 0.3859475
#here is the {robustbase} result
Library(robustbase)
D2rb<- covMcd(x)
D2rb$raw.mah
[1] 0.7737193 1.1177445 0.7290794 0.6275703 3.5517622 6.0334350
1.0582663
5.7169250
[9] 0.9420184 0.4210470
According to the help file for covMcd{robustbase}
raw.mah mahalanobis distances of the observations based on the raw
estimate of
the location and scatter.
So I think the second set of numbers should match the first. But they
do not.
What am I missing here?
Thanks, Fraser
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.