"Rand" == Rand Wilcox <rwilcox at usc.edu>
on Fri, 02 Dec 2005 10:42:48 -0800 (PST) writes:
Rand> In the robust library of S+, it is possible to compute
Rand> Rocke's constrained M-estimator of scatter and
Rand> location. Can't find anything within R that does the
Rand> same thing. Can someone tell me whether any package
Rand> exists for computing it?
I can't offhand.
There's the cov.rob functions in 'MASS' and the 'rrcov' package
which do similar (better ? :-) things.
Note that for a few weeks now, there's the R-SIG-robust mailing
list [--> CC to there], on which we plan to talk about and coordinate the
development of more robust methods for R --- as an offspring
from the Treviso workshop "Robustness and R".
Unfortunately, some things (e.g. workshop "minutes") are still a
bit in a waiting state.
One thing we'd be very interested is the OGK estimator of
Maronna and Zamar (JASA, 2002). Unfortunately, there, too, some
of the authors sold their code exclusively to Insightful (the
S-plus company).
Rand> Thanks.
You're welcome.
Rand> Rand Wilcox Professor Dept of Psychology
Rand> University of Southern California Los Angeles, CA
Rand> 90089-1061 FAX: 213-746-9082
Martin Maechler, ETH Zurich
Dear Rand,
unfortunately, right now none of the R packages can compute Rocke's
constrained M-estimator of scatter and location. I have implemented this
estimator as function covMTB in rrcov - see
http://www.dst.unive.it/rsr/Todorov-treviso.pdf , slide 35 - but I have
still some open problems to solve before releasing it to CRAN. I hope to be
able to do this even before Christmas. And if you look at the next slide in
this presentation, you will see that there is an working implementation of
OGK, as mentioned by Martin. Again, I hope to release it very soon.
Best regards,
Valentin
----- Original Message -----
From: "Martin Maechler" <maechler at stat.math.ethz.ch>
To: "Rand Wilcox" <rwilcox at usc.edu>
Cc: <r-help at stat.math.ethz.ch>; <R-SIG-Robust at stat.math.ethz.ch>
Sent: Friday, December 02, 2005 10:22 PM
Subject: Re: [RsR] [R] M-estimator R function question
"Rand" == Rand Wilcox <rwilcox at usc.edu>
on Fri, 02 Dec 2005 10:42:48 -0800 (PST) writes:
Rand> In the robust library of S+, it is possible to compute
Rand> Rocke's constrained M-estimator of scatter and
Rand> location. Can't find anything within R that does the
Rand> same thing. Can someone tell me whether any package
Rand> exists for computing it?
I can't offhand.
There's the cov.rob functions in 'MASS' and the 'rrcov' package
which do similar (better ? :-) things.
Note that for a few weeks now, there's the R-SIG-robust mailing
list [--> CC to there], on which we plan to talk about and coordinate the
development of more robust methods for R --- as an offspring
from the Treviso workshop "Robustness and R".
Unfortunately, some things (e.g. workshop "minutes") are still a
bit in a waiting state.
One thing we'd be very interested is the OGK estimator of
Maronna and Zamar (JASA, 2002). Unfortunately, there, too, some
of the authors sold their code exclusively to Insightful (the
S-plus company).
Rand> Thanks.
You're welcome.
Rand> Rand Wilcox Professor Dept of Psychology
Rand> University of Southern California Los Angeles, CA
Rand> 90089-1061 FAX: 213-746-9082
Martin Maechler, ETH Zurich
One thing we'd be very interested is the OGK estimator of
Maronna and Zamar (JASA, 2002). Unfortunately, there, too, some
of the authors sold their code exclusively to Insightful (the
S-plus company).
Insightful has implemented the procedure in some way I ignore (actually,
I don't like the way it works), but unfortunately I got not a single dollar
from that!. Anybody is free to implement the method (which is extremely
simple) in whatever language.
Ricardo
"Ricardo" == Ricardo Maronna <rmaronna at mail.retina.ar>
on Sat, 3 Dec 2005 18:55:00 -0300 writes:
Ricardo> One observation about the following.
[ so I *was* able to provoke some action on the R-SIG-Robust
list :-) ;-) ]
MM> One thing we'd be very interested is the OGK estimator of
MM> Maronna and Zamar (JASA, 2002). Unfortunately, there,
MM> too, some of the authors sold their code exclusively to
MM> Insightful (the S-plus company).
Ricardo> Insightful has implemented the procedure in
Ricardo> some way I ignore (actually, I don't like the way
Ricardo> it works), but unfortunately I got not a single
Ricardo> dollar from that!. Anybody is free to implement the
Ricardo> method (which is extremely simple) in whatever
Ricardo> language.
indeed. I apologize for my remark above where I should have
given either less or then more info. I'll do the latter now:
I think even before the paper came out, I got in contact with a
student from Ruben (Fatemah) who had programmed things for S-plus; I got
the code (S+ and C) but eventually learned I was not allowed to
use it for R because of a contract with Insightful.
But indeed, that shouldn't be a problem, since I think one
should program the OGK using R (S) code alone {just using
eigen() and other matrix operations},
and have the univariate scale estimate be a plug-in, i.e.,
a function argument. There, one could use Qn, Sn {I now have
these two in the not yet-released package on "basic robust statistic"},
the tau-estimate (from the 2002 JASA paper) or any other
consistent scale estimate -- ideally by passing it as an R
function to the ``cov.okg(...)'' function {or should that be
``covRob(..., method="OGK")'' ?}.
Martin Maechler, ETH Zurich
But indeed, that shouldn't be a problem, since I think one
should program the OGK using R (S) code alone {just using
eigen() and other matrix operations},
and have the univariate scale estimate be a plug-in, i.e.,
a function argument. There, one could use Qn, Sn {I now have
these two in the not yet-released package on "basic robust statistic"},
Martin, we are working now on a fast implementation of the
Qn, using C++ code in R. Have you done that similarly?
the tau-estimate (from the 2002 JASA paper) or any other
consistent scale estimate -- ideally by passing it as an R
function to the ``cov.okg(...)'' function {or should that be
``covRob(..., method="OGK")'' ?}.
Martin Maechler, ETH Zurich
-------------------------------------------------------
From: Prof. Dr. Peter Filzmoser
Dept. of Statistics & Probability Theory
Vienna University of Technology
Wiedner Hauptstrasse 8-10
A-1040 Vienna, Austria
Tel. +43 1 58801/10733
Fax. +43 1 58801/10799
E-mail: P.Filzmoser at tuwien.ac.at
Internet:
http://www.statistik.tuwien.ac.at/public/filz/
"Peter" == Peter Filzmoser <P.Filzmoser at tuwien.ac.at>
on Mon, 05 Dec 2005 09:02:54 +0100 writes:
Peter> Martin Maechler wrote:
>> But indeed, that shouldn't be a problem, since I think
>> one should program the OGK using R (S) code alone {just
>> using eigen() and other matrix operations}, and have the
>> univariate scale estimate be a plug-in, i.e., a function
>> argument. There, one could use Qn, Sn {I now have these
>> two in the not yet-released package on "basic robust
>> statistic"},
Peter> Martin, we are working now on a fast implementation
Peter> of the Qn, using C++ code in R. Have you done that
Peter> similarly?
I have based it on C code which had been based on the original
Fortran code from Croux and Rousseeuw available from Antwerpen.
Is there a faster algorithm now?
Given the traffic on this list,
and the not-yet-publication of the Treviso working group
"minutes", let me open the topic to which I alluded above as
well: Our working group ("regression", later joined by
"econometrics") talked about the possibility and then
desirability of an R package on "(basic) robust statitics" which
1) will be backed up by many `robustniks'
2) will remain focused, and hence (at least initially)
primarily based on the methodology in the upcoming book
by Maronna, Martin and Yohai (=: MMY); and even from there,
primarily on
a) regression [including GLM, non-lin reg, maybe a bit more]
b) multivariate : "location + scatter"
3) have quite a few authors (in the work group, we had some "commitments"),
with me as coordinator and maintainer.
4) other "Robustness in R" packages would *build*
(i.e. "Depend" in R package lingo) on that basic package,
and authors that contribute code from their own package
would of course remain authors and would also keep the
functions in their own packages (for a while at least; later their
package will load or even attach the "basic robust"
package, and hence have the functionality available indirectly).
-- more details are in the minutes of our workgroup(s) that I've
sent to Matias {as Treviso co-organizer} to be put on the
(Treviso RSR) webpage.
I think it would be great to get feedback from the R-SIG-robust
audience on the above; particularly if you think that the idea
is problematic or needs amendments/improvements before being
workable.
Last week, I had added the following to (my latex version of) the minutes:
------------------------
Later notes -- by Martin Maechler, only
---------------------------------------
o Proposed names for the `basic robust statistics' package:
1. robstats
2. robbase
3. baseRob
I'm choosing the first one for now.
o I've added (Rousseeuw & Croux)'s Qn and Sn estimators of
scale, based on an R package I had started in 2002, which is
based on the Fortran code from Antwerpen.
\item I plan to add ``\emph{psi - Function}'' objects, using an S4 class,
with instances for Huber, Hampel, Biweight, etc, at least for the
M-estimators (of location / regression).
Such an object should contain $\rho$, $\psi$, $w(x) := \psi(x)/x$,
$\psi'$ (= $d/\mathrm{dx} \psi$) functions, default values for the tuning
constants, and functionals such as $E_X[\psi(X)^2]$, $E_X[\psi'(X)]$,
for $X\sim N(0,1)$ (these functionals are still \emph{functions} of the
tuning parameters).
Very similarly, I'm interested also in the $\chi$ functions used for
B-/V- optimal M-estimators of \emph{scale}, both the monotone and the
redescending ones.
In general, I think we should add ``\textbf{basic M-estimation}''
things to the package as well, similar to, but
more general than \code{MASS::huber}; note that I'd like to
contribute \code{sfsmisc::huberM} (my `improved' \code{huber})
to the \pkg{robstats} as well.
------------------------
and actually I've done the S4 class for "psi-function" objects
last Friday.
Now I'm interested to hear more..
Martin
Here is an implementation of the OGK estimator completely in R. I
haven't touched it for a while and I forget how thoroughly I tested
it so use it with a bit of caution.
http://www.stats.ox.ac.uk/~konis/pairwise.q
Ricardo, can you be more specific about what you don't like in the
implementation of OGK in S-Plus?
Kjell
On Dec 3, 2005, at 9:55 PM, Ricardo Maronna wrote:
One observation about the following.
One thing we'd be very interested is the OGK estimator of
Maronna and Zamar (JASA, 2002). Unfortunately, there, too, some
of the authors sold their code exclusively to Insightful (the
S-plus company).
Insightful has implemented the procedure in some way I ignore
(actually,
I don't like the way it works), but unfortunately I got not a
single dollar
from that!. Anybody is free to implement the method (which is
extremely
simple) in whatever language.
Ricardo
"Kjell" == Kjell Konis <konis at stats.ox.ac.uk>
on Mon, 5 Dec 2005 10:29:11 +0000 writes:
Kjell> Here is an implementation of the OGK estimator
Kjell> completely in R. I haven't touched it for a while
Kjell> and I forget how thoroughly I tested it so use it
Kjell> with a bit of caution.
Kjell> http://www.stats.ox.ac.uk/~konis/pairwise.q
excellent! That's exactly what I meant (in my post two hours
ago):
The general OGK algorithm, but with the flexibility of the main
function having arguments for the "building block" functions, so
one can use different univariate sigma() or ``weight()'' functions!
When we'll start to discuss things along the covRobust()
proposal(s) from the "multivariate" working group, we'll also
have to think about function name(s) and the returned object
("output"), but I think our student will be able use your
function "as is".
Thanks a lot, Kjell!
Martin.
"Kjell" == Kjell Konis <konis at stats.ox.ac.uk>
on Mon, 5 Dec 2005 10:29:11 +0000 writes:
Kjell> Here is an implementation of the OGK estimator
Kjell> completely in R. I haven't touched it for a while
Kjell> and I forget how thoroughly I tested it so use it
Kjell> with a bit of caution.
Kjell> http://www.stats.ox.ac.uk/~konis/pairwise.q
excellent! That's exactly what I meant (in my post two hours
ago):
The general OGK algorithm, but with the flexibility of the main
function having arguments for the "building block" functions, so
one can use different univariate sigma() or ``weight()'' functions!
When we'll start to discuss things along the covRobust()
proposal(s) from the "multivariate" working group, we'll also
have to think about function name(s) and the returned object
("output"), but I think our student will be able use your
function "as is".
Thanks a lot, Kjell
Yes, thanks a lot already.
Concerning the ``multivariate'' working group, I have put a preliminary
Tex and Pdf-file of what we discussed in Treviso online.
It can be retreived from: http://www.econ.kuleuven.be/public/ndbae49/R/
If anyone has comments or would like to change/update/add things to the
Tex/Pdf-file you can always contact me or send me an updated Tex-file.
Best regards,
Kristel
Hi Kristel,
a couple of additions to your 'minutes' (Section 2.3 Other):
1. The MATLAB implementation of D. Pe?na & F. Prieto can be found at:
http://halweb.uc3m.es/fjp/kur.zip
2. An R implementation of the estimator of D. Olive described in the
CSDA paper can be found in
http://www.math.siu.edu/olive/rpack.txt - function covmba()
3. I have R interface to some of the FORTRAN programs of Hawkins, but
may be only the MWMCD (linear discriminant analysis) is of interest.
best regards,
Valentin
On 12/5/05, Kristel Joossens <kristel.joossens at econ.kuleuven.be> wrote:
Martin Maechler wrote:
"Kjell" == Kjell Konis <konis at stats.ox.ac.uk>
on Mon, 5 Dec 2005 10:29:11 +0000 writes:
Kjell> Here is an implementation of the OGK estimator
Kjell> completely in R. I haven't touched it for a while
Kjell> and I forget how thoroughly I tested it so use it
Kjell> with a bit of caution.
Kjell> http://www.stats.ox.ac.uk/~konis/pairwise.q
excellent! That's exactly what I meant (in my post two hours
ago):
The general OGK algorithm, but with the flexibility of the main
function having arguments for the "building block" functions, so
one can use different univariate sigma() or ``weight()'' functions!
When we'll start to discuss things along the covRobust()
proposal(s) from the "multivariate" working group, we'll also
have to think about function name(s) and the returned object
("output"), but I think our student will be able use your
function "as is".
Thanks a lot, Kjell
Yes, thanks a lot already.
Concerning the ``multivariate'' working group, I have put a preliminary
Tex and Pdf-file of what we discussed in Treviso online.
It can be retreived from: http://www.econ.kuleuven.be/public/ndbae49/R/
If anyone has comments or would like to change/update/add things to the
Tex/Pdf-file you can always contact me or send me an updated Tex-file.
Best regards,
Kristel
\item I plan to add ``\emph{psi - Function}'' objects, using an S4 class,
with instances for Huber, Hampel, Biweight, etc, at least for the
M-estimators (of location / regression).
Such an object should contain $\rho$, $\psi$, $w(x) := \psi(x)/x$,
$\psi'$ (= $d/\mathrm{dx} \psi$) functions, default values for the tuning
constants, and functionals such as $E_X[\psi(X)^2]$, $E_X[\psi'(X)]$,
for $X\sim N(0,1)$ (these functionals are still \emph{functions} of the
tuning parameters).
> [snip]
and actually I've done the S4 class for "psi-function" objects
last Friday.
Now I'm interested to hear more..
I think estimating functions are "natural" candidates to be treated as
S4 objects, as Martin's blueprint shows above.
One question: I have written the "workhorses" of my code in C and the
"psi" functions are defined inside the C code (tuning constants are
taken from R). If we move towards having the "psi" functions constructed
in R (which I think is a good idea for many reasons), is there an easy
way to have the C code use them? .Call() comes to mind, but of course, I
would want to re-write the least possible number of lines of my original
C code.
Matias
PS: A website collecting the working groups minutes and other related
material to this project is already being set up and will be announced
in this list when it becomes "live" (which should happen soon).
Ricardo, can you be more specific about what you don't like in the
implementation of OGK in S-Plus?
Hi Kjell,
The "pure" OGK estimate works regardless of the collinearity of the
data.
The resulting estimate needs some improvement step. We thought of the
simple "one-step hard rejection". It was indicated in the paper that no
explicit matrix inversion is required for this step, so that it can be made
to work even for collinear data.
I have tried the S-Plus version with data that were definitely not
collinear, and I got an error message of the sort "matrix not invertible". I
couldn't figure out why.
By the way, on later reflexion I have thought that a better improvement
would be to use the "pure" OGK as the starting point for the iterations of
an S-estimate, which would be much cheaper than subsampling.
Regards,
Ricardo
Dear Kjell ,
beware of a minor error in the pairwise code that you posted: the
weights in gk.sigmamu() will never become 0, even if abs(x) > c1.
The correction should be:
gk.sigmamu <- function(x, c1 = 4.5, c2 = 3.0, mu.too = FALSE, ...)
{
n <- length(x)
medx <- median(x)
sigma0 <- median(abs(x - medx))
# VT::19.12.2005 - should give 0 weights if abs(x - medx) / sigma0 > c1
#
# w <- (x - medx) / sigma0
# w <- (1.0 - (w / c1)^2)^2
# w[w < 0.0] <- 0.0
w <- abs(x - medx) / sigma0
w <- ifelse(w<=c1,(1.0 - (w / c1)^2)^2,0)
# VT::19.12.2005 - END
mu <- sum(x * w) / sum(w)
...... ... ... ..
best regard
valentin
On 12/5/05, Kjell Konis <konis at stats.ox.ac.uk> wrote:
Here is an implementation of the OGK estimator completely in R. I
haven't touched it for a while and I forget how thoroughly I tested
it so use it with a bit of caution.
http://www.stats.ox.ac.uk/~konis/pairwise.q
Ricardo, can you be more specific about what you don't like in the
implementation of OGK in S-Plus?
Kjell
On Dec 3, 2005, at 9:55 PM, Ricardo Maronna wrote:
One observation about the following.
One thing we'd be very interested is the OGK estimator of
Maronna and Zamar (JASA, 2002). Unfortunately, there, too, some
of the authors sold their code exclusively to Insightful (the
S-plus company).
Insightful has implemented the procedure in some way I ignore
(actually,
I don't like the way it works), but unfortunately I got not a
single dollar
from that!. Anybody is free to implement the method (which is
extremely
simple) in whatever language.
Ricardo
"Valentin" == Valentin Todorov <valentin.to at gmail.com>
on Tue, 20 Dec 2005 13:20:33 +0100 writes:
Valentin> Dear Kjell ,
Valentin> beware of a minor error in the pairwise code that you posted: the
Valentin> weights in gk.sigmamu() will never become 0, even if abs(x) > c1.
Indeed!
Valentin> The correction should be:
gk.sigmamu <- function(x, c1 = 4.5, c2 = 3.0, mu.too = FALSE, ...)
{
n <- length(x)
medx <- median(x)
sigma0 <- median(abs(x - medx))
# VT::19.12.2005 - should give 0 weights if abs(x - medx) / sigma0 > c1
#
# w <- (x - medx) / sigma0
# w <- (1.0 - (w / c1)^2)^2
# w[w < 0.0] <- 0.0
w <- abs(x - medx) / sigma0
w <- ifelse(w<=c1,(1.0 - (w / c1)^2)^2,0)
# VT::19.12.2005 - END
mu <- sum(x * w) / sum(w)
...... ... ... ..
Valentin> best regard
Valentin> valentin
Note however that Kjell was trying to be smart, namely *not*
using ifelse() in such a case -- for efficiency reasons. But he
forgot to do it properly: Here, pmax() is much more efficient
than ifelse(). Note that gk.sigmamu() is really the function that is called
many times, so it makes sense to write it as efficiently as possible
--- still keeping it well readable.
I had also applied a few minor cosmetic improvements to Kjell's
original code --- but overlooked the obvious buglet ! ---
which now leads to
gk.sigmamu <- function(x, c1 = 4.5, c2 = 3.0, mu.too = FALSE, ...)
{
n <- length(x)
medx <- median(x)
x. <- abs(x - medx)
c.sigma0 <- c1 * median(x.)
w <- pmax(0, 1 - (x. / c.sigma0)^2)^2
mu <- sum(x * w) / sum(w)
x <- (x - mu) / sigma0
rho <- x^2
rho[rho > c2^2] <- c2^2
## sigma2 <- sigma0^2 / n * sum(rho)
## return
c(if(mu.too) mu,
## sqrt(sigma2) == sqrt( sigma0^2 / n * sum(rho) ) :
sigma0 * sqrt(sum(rho)/n))
}
Martin Maechler, ETH Zurich
Valentin> On 12/5/05, Kjell Konis <konis at stats.ox.ac.uk> wrote:
>> Here is an implementation of the OGK estimator completely in R. I
>> haven't touched it for a while and I forget how thoroughly I tested
>> it so use it with a bit of caution.
>>
>> http://www.stats.ox.ac.uk/~konis/pairwise.q
>>
>> Ricardo, can you be more specific about what you don't like in the
>> implementation of OGK in S-Plus?
>>
>> Kjell
>>
>>
>>
>> On Dec 3, 2005, at 9:55 PM, Ricardo Maronna wrote:
>>
>> > One observation about the following.
>> >
>> >> One thing we'd be very interested is the OGK estimator of
>> >> Maronna and Zamar (JASA, 2002). Unfortunately, there, too, some
>> >> of the authors sold their code exclusively to Insightful (the
>> >> S-plus company).
>> >
>> > Insightful has implemented the procedure in some way I ignore
>> > (actually,
>> > I don't like the way it works), but unfortunately I got not a
>> > single dollar
>> > from that!. Anybody is free to implement the method (which is
>> > extremely
>> > simple) in whatever language.
>> > Ricardo
>> >
>> > _______________________________________________
>> > R-SIG-Robust at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>>
>> _______________________________________________
>> R-SIG-Robust at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>>
Valentin> _______________________________________________
Valentin> R-SIG-Robust at r-project.org mailing list
Valentin> https://stat.ethz.ch/mailman/listinfo/r-sig-robust