Dear all, Thanks to a computational problem we stumble into a discussion that is more fundamental: What should leverage be in a model with both categorical and continuous explanatory variables? A. If we consider the original definition connecting observed and fitted values, the categorical variables must remain in the design matrix on which leverage is calculated. B. If leverage measures some kind of summary measure of influence on coefficients, one may argue for separating the influence on coefficients of continuous Xs from the influence on estimated effects of factors. For unbalanced designs, however, this influence depends on a partialized design matrix, not on the "sub-design" obtained by considering only the part of the design matrix corresponding to the continuous variables. My conclusion: The h_ii (or Mahalanobis distances) calculated for the sub-design-matrix do not make any clear sense and should be avoided. My recommendation: a. If a high breakdown "covariance estimate" for the full design matrix fails because of too many singular elemental subsets (a few ones can simply be ignored as proposed before) then an M-estimator should be used. b. If the M-estimator converges to a singular matrix, then the non-robust sample covariance should be used. Of course, the output should be clear about the version used. With some effort, one might try and define a robust version of the partialized design matrix (see B) and the corresponding estimator of the "covariance matrix". Let me add something that may be obvious to all of us: The problem with the factors also appears for the high breakdown regression estimator itself. The solution by Maronna and Yohai ? is to split the problem into continuous and categorical variables. Therefore, there may be a treatment of the leverage problem that corresponds to this estimation procedure. I plan to think some more about this and communicate any results from it -- unless somebody tells me that this has been done long ago ... Cheers Werner Stahel ----------------- This message was sent by --------------------------- Werner Stahel http://stat.ethz.ch/~stahel Seminar fuer Statistik phone : +41 44 632 34 30 ETH-Zentrum, LEO D8 fax : +41 44 632 12 28 CH-8092 Zurich, Switzerland meet me: Leonhardstr.27, D8
[RsR] Singular covariance in plot.lmrob
1 message · Werner Stahel