Skip to content

[RsR] [R] M-estimator R function question

14 messages · Valentin Todorov, Martin Maechler, Peter Filzmoser +4 more

#
Rand> In the robust library of S+, it is possible to compute
    Rand> Rocke's constrained M-estimator of scatter and
    Rand> location. Can't find anything within R that does the
    Rand> same thing. Can someone tell me whether any package
    Rand> exists for computing it?

I can't offhand.
There's the cov.rob functions in 'MASS' and the  'rrcov' package
which do similar (better ? :-) things.

Note that for a few weeks now, there's the R-SIG-robust mailing
list [--> CC to there],  on which we plan to talk about and coordinate the
development of more robust methods for R --- as an offspring
from the Treviso workshop "Robustness and R".

Unfortunately, some things (e.g. workshop "minutes") are still a
bit in a waiting state.

One thing we'd be very interested is the OGK estimator of
Maronna and Zamar (JASA, 2002).  Unfortunately, there, too, some
of the authors sold their code exclusively to Insightful (the
S-plus company).

    Rand> Thanks.

You're welcome.

    Rand> 	Rand Wilcox Professor Dept of Psychology
    Rand> University of Southern California Los Angeles, CA
    Rand> 90089-1061 FAX: 213-746-9082

Martin Maechler, ETH Zurich
#
Dear Rand,
unfortunately, right now none of the R packages can compute Rocke's
constrained M-estimator of scatter and location. I have implemented this
estimator as function covMTB in rrcov - see
http://www.dst.unive.it/rsr/Todorov-treviso.pdf , slide 35 - but I have
still some open problems to solve before releasing it to CRAN. I hope to be
able to do this even before Christmas. And if you look at the next slide in
this presentation, you will see that there is an working implementation of
OGK, as mentioned by Martin. Again, I hope to release it very soon.

Best regards,
Valentin



----- Original Message ----- 
From: "Martin Maechler" <maechler at stat.math.ethz.ch>
To: "Rand Wilcox" <rwilcox at usc.edu>
Cc: <r-help at stat.math.ethz.ch>; <R-SIG-Robust at stat.math.ethz.ch>
Sent: Friday, December 02, 2005 10:22 PM
Subject: Re: [RsR] [R] M-estimator R function question
#
One observation about the following.
Insightful has implemented the procedure in some way I ignore (actually, 
I don't like the way it works), but unfortunately I got not a single dollar 
from that!. Anybody is free to implement the method (which is extremely 
simple) in whatever language.
            Ricardo
1 day later
#
Ricardo> One observation about the following.

   [ so I *was* able to provoke some action on the R-SIG-Robust
     list :-) ;-) ]

    MM> One thing we'd be very interested is the OGK estimator of
    MM> Maronna and Zamar (JASA, 2002).  Unfortunately, there,
    MM> too, some of the authors sold their code exclusively to
    MM> Insightful (the S-plus company).

    Ricardo>     Insightful has implemented the procedure in
    Ricardo> some way I ignore (actually, I don't like the way
    Ricardo> it works), but unfortunately I got not a single
    Ricardo> dollar from that!. Anybody is free to implement the
    Ricardo> method (which is extremely simple) in whatever
    Ricardo> language.  

indeed.  I apologize for my remark above where I should have
given either less or then more info. I'll do the latter now: 
I think even before the paper came out, I got in contact with a
student from Ruben (Fatemah) who had programmed things for S-plus; I got
the code (S+ and C) but eventually learned I was not allowed to
use it for R because of a contract with Insightful. 

But indeed, that shouldn't be a problem, since I think one
should program the OGK using R (S) code alone {just using
eigen() and other matrix operations},
and have the univariate scale estimate be a plug-in, i.e.,
a function argument.  There, one could use  Qn, Sn  {I now have
these two in the not yet-released package on "basic robust statistic"},
the tau-estimate (from the 2002 JASA paper) or any other
consistent scale estimate -- ideally by passing it as an R
function to the ``cov.okg(...)'' function {or should that be
 ``covRob(..., method="OGK")'' ?}.

Martin Maechler, ETH Zurich
#
Martin Maechler wrote:

            
Martin, we are working now on a fast implementation of the
Qn, using C++ code in R. Have you done that similarly?

  
    
#

        
Peter> Martin Maechler wrote:
>> But indeed, that shouldn't be a problem, since I think
    >> one should program the OGK using R (S) code alone {just
    >> using eigen() and other matrix operations}, and have the
    >> univariate scale estimate be a plug-in, i.e., a function
    >> argument.  There, one could use Qn, Sn {I now have these
    >> two in the not yet-released package on "basic robust
    >> statistic"},

    Peter> Martin, we are working now on a fast implementation
    Peter> of the Qn, using C++ code in R. Have you done that
    Peter> similarly?

I have based it on C code which had been based on the original
Fortran code from Croux and Rousseeuw available from Antwerpen.
Is there a faster algorithm now?

Given the traffic on this list, 
and the not-yet-publication of the Treviso working group
"minutes", let me open the topic to which I alluded above as
well: Our working group ("regression", later joined by
"econometrics") talked about the possibility and then
desirability of an R package on "(basic) robust statitics" which

 1)  will be backed up by many `robustniks'
 2)  will remain focused, and hence (at least initially)
     primarily based on the methodology in the upcoming book 
     by Maronna, Martin and Yohai (=: MMY); and even from there,
     primarily on
      a) regression [including GLM, non-lin reg, maybe a bit more]
      b) multivariate : "location + scatter"
 3)  have quite a few authors (in the work group, we had some "commitments"),
     with me as coordinator and maintainer.
 4)  other "Robustness in R" packages would *build*
     (i.e. "Depend" in  R package lingo) on that basic package,
     and authors that contribute code from their own package
     would of course remain authors and would also keep the
     functions in their own packages (for a while at least; later their
     package will load or even attach the "basic robust"
     package, and hence have the functionality available indirectly).

-- more details are in the minutes of our workgroup(s) that I've
   sent to Matias {as Treviso co-organizer} to be put on the
   (Treviso RSR) webpage.

I think it would be great to get feedback from the R-SIG-robust
audience on the above; particularly if you think that the idea
is problematic or needs amendments/improvements before being
workable.

Last week, I had added the following to (my latex version of) the minutes:

------------------------

 Later notes -- by Martin Maechler, only
 ---------------------------------------

  o  Proposed names for the `basic robust statistics' package:
     1. robstats
     2. robbase
     3. baseRob

   I'm choosing the first one for now.

  o I've added (Rousseeuw & Croux)'s Qn and Sn estimators of
    scale, based on an R package I had started in 2002, which is
    based on the Fortran code from Antwerpen. 

  \item I plan to add ``\emph{psi - Function}'' objects, using an S4 class,
    with instances for Huber, Hampel, Biweight, etc, at least for the
    M-estimators (of location / regression).
    Such an object should contain  $\rho$, $\psi$, $w(x) := \psi(x)/x$,
    $\psi'$ (= $d/\mathrm{dx} \psi$) functions, default values for the tuning
    constants, and functionals such as $E_X[\psi(X)^2]$,  $E_X[\psi'(X)]$,
    for $X\sim N(0,1)$ (these functionals are still \emph{functions} of the
    tuning parameters).

    Very similarly, I'm interested also in the $\chi$ functions used for
    B-/V- optimal M-estimators of \emph{scale}, both the monotone and the
    redescending ones.

    In general, I think we should add ``\textbf{basic M-estimation}''
    things to the package as well, similar to, but
    more general than \code{MASS::huber}; note that I'd like to
    contribute \code{sfsmisc::huberM} (my `improved' \code{huber})
    to the \pkg{robstats} as well.

------------------------

and actually I've done the S4 class for "psi-function" objects
last Friday.

Now I'm interested to hear more..
Martin
#
Here is an implementation of the OGK estimator completely in R.  I  
haven't touched it for a while and I forget how thoroughly I tested  
it so use it with a bit of caution.

   http://www.stats.ox.ac.uk/~konis/pairwise.q

Ricardo, can you be more specific about what you don't like in the  
implementation of OGK in S-Plus?

Kjell
On Dec 3, 2005, at 9:55 PM, Ricardo Maronna wrote:

            
#
Kjell> Here is an implementation of the OGK estimator
    Kjell> completely in R.  I haven't touched it for a while
    Kjell> and I forget how thoroughly I tested it so use it
    Kjell> with a bit of caution.

    Kjell>    http://www.stats.ox.ac.uk/~konis/pairwise.q

excellent!  That's exactly what I meant (in my post two hours
ago):

  The general OGK algorithm, but with the  flexibility of the main
  function having arguments for the "building block" functions, so
  one can use different univariate sigma() or ``weight()'' functions!

When we'll start to discuss things along the  covRobust()
proposal(s) from the "multivariate" working group, we'll also
have to think about function name(s) and the returned object
("output"),  but I think our student will be able use your
function "as is".

Thanks a lot, Kjell!

Martin.
#
Martin Maechler wrote:
Yes, thanks a lot already.

Concerning the ``multivariate'' working group, I have put a preliminary 
Tex and Pdf-file of what we discussed in Treviso online.
It can be retreived from: http://www.econ.kuleuven.be/public/ndbae49/R/

If anyone has comments or would like to change/update/add things to the 
Tex/Pdf-file you can always contact me or send me an updated Tex-file.

Best regards,
Kristel

  
    
#
Hi Kristel,
a couple of additions to your 'minutes' (Section 2.3 Other):

1. The MATLAB implementation of D. Pe?na & F. Prieto can be found at:
http://halweb.uc3m.es/fjp/kur.zip

2. An R implementation of the estimator of D. Olive described in the
CSDA paper can be found in
http://www.math.siu.edu/olive/rpack.txt - function covmba()

3. I have R interface to some of the FORTRAN programs of Hawkins, but
may be only the MWMCD (linear discriminant analysis)  is of interest.

best regards,
Valentin
On 12/5/05, Kristel Joossens <kristel.joossens at econ.kuleuven.be> wrote:
#
Martin Maechler wrote:
> [snip]
I think estimating functions are "natural" candidates to be treated as 
S4 objects, as Martin's blueprint shows above.

One question: I have written the "workhorses" of my code in C and the 
"psi" functions are defined inside the C code (tuning constants are 
taken from R). If we move towards having the "psi" functions constructed 
in R (which I think is a good idea for many reasons), is there an easy 
way to have the C code use them? .Call() comes to mind, but of course, I 
would want to re-write the least possible number of lines of my original 
C code.

Matias

PS: A website collecting the working groups minutes and other related 
material to this project is already being set up and will be announced 
in this list when it becomes "live" (which should happen soon).
#
Hi Kjell,
    The "pure" OGK estimate works regardless of the collinearity of the 
data.
    The resulting estimate needs some improvement step. We thought of the 
simple "one-step hard rejection". It was indicated in the paper that no 
explicit matrix inversion is required for this step, so that it can be made 
to work even for collinear data.
    I have tried the S-Plus version with data that were definitely not 
collinear, and I got an error message of the sort "matrix not invertible". I 
couldn't figure out why.

    By the way, on later reflexion I have thought that a better improvement 
would be to use the "pure" OGK as the starting point for the iterations of 
an S-estimate, which would be much cheaper than subsampling.
                        Regards,
                                Ricardo
14 days later
#
Dear Kjell ,

beware of a minor error in the pairwise code that you posted: the
weights in gk.sigmamu() will never become 0, even if abs(x) > c1.

The correction should be:

gk.sigmamu <- function(x, c1 = 4.5, c2 = 3.0, mu.too = FALSE, ...)
{
  n <- length(x)
  medx <- median(x)
  sigma0 <- median(abs(x - medx))

  # VT::19.12.2005 - should give 0 weights if abs(x - medx) / sigma0 > c1
  #
  #  w <- (x - medx) / sigma0
  #  w <- (1.0 - (w / c1)^2)^2
  #  w[w < 0.0] <- 0.0

  w <- abs(x - medx) / sigma0
  w <- ifelse(w<=c1,(1.0 - (w / c1)^2)^2,0)

  # VT::19.12.2005 - END

  mu <- sum(x * w) / sum(w)
...... ... ... ..


best regard
valentin
On 12/5/05, Kjell Konis <konis at stats.ox.ac.uk> wrote:
#
Valentin> Dear Kjell ,
    Valentin> beware of a minor error in the pairwise code that you posted: the
    Valentin> weights in gk.sigmamu() will never become 0, even if abs(x) > c1.

Indeed!

    Valentin> The correction should be:
Valentin> best regard
    Valentin> valentin

Note however that Kjell was trying to be smart, namely *not*
using ifelse() in such a case -- for efficiency reasons. But he
forgot to do it properly: Here,  pmax()  is much more efficient
than ifelse().  Note that gk.sigmamu() is really the function that is called
many times, so it makes sense to write it as efficiently as possible
--- still keeping it well readable.

I had also applied a few minor cosmetic improvements to Kjell's
original code --- but overlooked the obvious buglet ! ---  
which now leads to 

gk.sigmamu <- function(x, c1 = 4.5, c2 = 3.0, mu.too = FALSE, ...)
{
    n <- length(x)
    medx <- median(x)
    x. <- abs(x - medx)
    c.sigma0 <- c1 * median(x.)
    w <- pmax(0, 1 - (x. / c.sigma0)^2)^2
    mu <- sum(x * w) / sum(w)

    x <- (x - mu) / sigma0
    rho <- x^2
    rho[rho > c2^2] <- c2^2
    ## sigma2 <- sigma0^2 / n * sum(rho)

    ## return
    c(if(mu.too) mu,
      ## sqrt(sigma2) == sqrt( sigma0^2 / n * sum(rho) ) :
      sigma0 * sqrt(sum(rho)/n))
}


Martin Maechler, ETH Zurich
Valentin> On 12/5/05, Kjell Konis <konis at stats.ox.ac.uk> wrote:
>> Here is an implementation of the OGK estimator completely in R.  I
    >> haven't touched it for a while and I forget how thoroughly I tested
    >> it so use it with a bit of caution.
    >> 
    >> http://www.stats.ox.ac.uk/~konis/pairwise.q
    >> 
    >> Ricardo, can you be more specific about what you don't like in the
    >> implementation of OGK in S-Plus?
    >> 
    >> Kjell
    >> 
    >> 
    >>
>> On Dec 3, 2005, at 9:55 PM, Ricardo Maronna wrote:
>> 
    >> > One observation about the following.
    >> >
    >> >> One thing we'd be very interested is the OGK estimator of
    >> >> Maronna and Zamar (JASA, 2002).  Unfortunately, there, too, some
    >> >> of the authors sold their code exclusively to Insightful (the
    >> >> S-plus company).
    >> >
    >> >     Insightful has implemented the procedure in some way I ignore
    >> > (actually,
    >> > I don't like the way it works), but unfortunately I got not a
    >> > single dollar
    >> > from that!. Anybody is free to implement the method (which is
    >> > extremely
    >> > simple) in whatever language.
    >> >             Ricardo
    >> >
    >> > _______________________________________________
    >> > R-SIG-Robust at r-project.org mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
    >> 
    >> _______________________________________________
    >> R-SIG-Robust at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
    >> 

    Valentin> _______________________________________________
    Valentin> R-SIG-Robust at r-project.org mailing list
    Valentin> https://stat.ethz.ch/mailman/listinfo/r-sig-robust