Please, have a look at the RSR home page
http://www.statistik.tuwien.ac.at/rsr/
where you can find now minutes from the workshop in
Treviso from *all* working groups!
Thanks a lot to the contributors.
Regards,
Peter
-------------------------------------------------------
From: Prof. Dr. Peter Filzmoser
Dept. of Statistics & Probability Theory
Vienna University of Technology
Wiedner Hauptstrasse 8-10
A-1040 Vienna, Austria
Tel. +43 1 58801/10733
Fax. +43 1 58801/10799
E-mail: P.Filzmoser at tuwien.ac.at
Internet:
http://www.statistik.tuwien.ac.at/public/filz/
Dear List,
A few days ago I uploaded the package "roblm" to CRAN. It implements
MM-regression estimators and currently has some diagnostic plots as
well. The documentation needs quite a bit of work, but the main
information is there.
Note that the name of the package (and the corresponding class) is
"roblm", but this does not mean that I necessarily prefer this name over
others. I've been working on this for some time now, and for the reasons
that I mentioned in Treviso ("rlm" and "lmRob" are already taken) I
settled for roblm.
I took the liberty to re-organize the regression workgroup minutes
relating to linear regression in a format closer to a "to-do" list. I
would very much appreciate your feedback. Names within parenthesis
indicate "volunteers" for a particular task. Please feel free to correct
all my mistakes (and to add your name to the list of volunteers!)
Thanks.
Matias
-- An initial (partial) list of things that remain to be done for robust
linear regression
- expose the "initial estimator" as a separate function.
This should include the fast-S (already built-in), the
alternate M-S estimator for the case of many factor
variables (Victor can provide code), and the heuristic
initial estimator of Yohai-Pena (Victor has code?)
- explore how to use score ("psi") functions declared in
R (as S4 objects) in the C code
- decide which options should stay in the "control" function
- add to summary.roblm(), print.roblm() and
print.summary.roblm() information on which estimation
method was used (MM, etc)
- write a model selection function (Victor has code for a robust
backward stepwise method (RFPE?); Elvezio may know of / have
code for the robust Cp; Eva can provide this part?)
- write an "anova" function using robust F-, Wald tests (Matias)
- add the robust weights to the returned object (Matias)
- improve / complete roblm documentation (Matias)
- incorporate more data sets (with documentation) (Matias)
"Matias" == Matias Salibian-Barrera <matias at stat.ubc.ca>
on Wed, 21 Dec 2005 14:07:55 -0800 writes:
Matias> Dear List,
Matias> A few days ago I uploaded the package "roblm" to CRAN. It implements
Matias> MM-regression estimators and currently has some diagnostic plots as
Matias> well. The documentation needs quite a bit of work, but the main
Matias> information is there.
Thank you, Matias!
Matias> Note that the name of the package (and the corresponding class) is
Matias> "roblm", but this does not mean that I necessarily prefer this name over
Matias> others. I've been working on this for some time now, and for the reasons
Matias> that I mentioned in Treviso ("rlm" and "lmRob" are already taken) I
Matias> settled for roblm.
(I'm replying to this in a different e-mail on
"function naming ... " )
Matias> I took the liberty to re-organize the regression
Matias> workgroup minutes relating to linear regression in a
Matias> format closer to a "to-do" list. I would very much
Matias> appreciate your feedback. Names within parenthesis
Matias> indicate "volunteers" for a particular task. Please
Matias> feel free to correct all my mistakes (and to add
Matias> your name to the list of volunteers!)
Matias> -- An initial (partial) list of things that remain
Matias> to be done for robust linear regression
- expose the "initial estimator" as a separate function.
This should include the fast-S (already built-in), the
alternate M-S estimator for the case of many factor
variables (Victor can provide code), and the heuristic
initial estimator of Yohai-Pena (Victor has code?)
- explore how to use score ("psi") functions declared in
R (as S4 objects) in the C code
good point (which you've raised on this list earlier, but didn't
get an answer)!
I'm not yet sure what the optimal approach would be here.
There is some overhead calling the interpreted R code from the C
code. In the mean time I had an idea which might be more
flexible: The R class could have a slot which is (empty or) a
pointer ("externalptr") to a C function if that is available.
Hence for some score/psi/rho/.. functions one, R would call fast
builtin C-code where one still would have the full flexibility
of "playing with" new *-functions..
- decide which options should stay in the "control" function
- add to summary.roblm(), print.roblm() and
print.summary.roblm() information on which estimation
method was used (MM, etc)
- write a model selection function (Victor has code for a robust
backward stepwise method (RFPE?); Elvezio may know of / have
code for the robust Cp; Eva can provide this part?)
- write an "anova" function using robust F-, Wald tests (Matias)
yes, in my view; I had argued in Treviso that 'anova' shoold be
used as function for comparing nested models also in situations
where it's really not an '[an]alysis [o]f [va]riance' anymore.
- add the robust weights to the returned object (Matias)
I agree. This applies to almost all robust procedures.
Here we should also try to find a common name
('wts','weights',..), and also 'standardize' I think; either
requiring
max_i w_i = 1 (natural for w(r_i) = psi(r_i) / r_i) or then
sum_i w_i = 1 (natural for formulae using the weights)
- improve / complete roblm documentation (Matias)
- incorporate more data sets (with documentation) (Matias)
I've also asked for this a few days ago, and Valentin has
promised to provide quite a few from the Rousseeuw-Leroy book.
Hopefully, those from the MMY book (Maronna-Martin-Yohai) will
also be provided relatively soon.
Martin
"Matias" == Matias Salibian-Barrera <matias at stat.ubc.ca>
on Wed, 21 Dec 2005 14:07:55 -0800 writes:
.........
Matias> A few days ago I uploaded the package "roblm" to CRAN.
Matias> ........
Matias> Note that the name of the package (and the
Matias> corresponding class) is "roblm", but this does not
Matias> mean that I necessarily prefer this name over
Matias> others. I've been working on this for some time now,
Matias> and for the reasons that I mentioned in Treviso
Matias> ("rlm" and "lmRob" are already taken) I settled for
Matias> roblm.
yes. In Treviso, we had talked about trying to go for "rlm" and
I had offered to try to resolve the name clash with MASS::rlm().
I now think that this may not be a good idea: rlm()
in MASS is really relatively prominent in the MASS book with
years of tradition, and usage in many places. Also its default
method is "M", not "MM" { for a good reason: back compatibility
with older versions of rlm()}.
Hence our new (robust|rob|rf|r)lm(Rob) function would never be
'call-compatible' to MASS::rlm() and for that reason I think we
should strive for a different naming scheme.
Andreas Ruckstuhl had raised the point already at Treviso, and
on this list (Dec 7, "Re: [RsR] OGK covariance estimator") where
he'd proposed
r* [as 'rlm' in MASS]
rob* [as 'roblm' above]
rf* ["[R]obust [F]itting of ..", used in Andreas' package]
The last one may be a bit more logical than all the others,
since in one sense there's just one linear model with different
fitting methodologies, since in fact,
the error distribution hasn't been part of the
model specification __for most statistical software__
OTOH, "rob" is easy to pronounce/spell
[OTOH, there's all the people called 'Rob' ...]
As you can guess, from Andreas' list, I'd either take
'rob*' or 'rf*' and don't have strong preference.
Other opinions?
Martin
Dear RSR-List,
I wish you all the best in the new 2006.
May be all of you are enjoying the holidays, but I want to ask (a stupid?)
question about graphics - what type of graphics shall we prefer for
implementing the plot functions of a package, the traditional graphics or
the lattice package:
a) only traditional
b) only lattice
c) mixed (some of the plots are implemented using traditional graphics and
others - the lattice package)
d) both - all plots in the package are implemented as traditional as well as
lattice plots and the user has an option to choose one of them.
Best regards,
Valentin
Dear RSR-List,
I wish you all the best in the new 2006.
May be all of you are enjoying the holidays, but I want to ask (a stupid?)
question about graphics - what type of graphics shall we prefer for
implementing the plot functions of a package, the traditional graphics or
the lattice package:
a) only traditional
b) only lattice
c) mixed (some of the plots are implemented using traditional graphics and
others - the lattice package)
d) both - all plots in the package are implemented as traditional as well as
lattice plots and the user has an option to choose one of them.
Let me say two things about this:
1. base vs. lattice graphics is not the only choice. You can also write
graphics using grid directly and not via the lattice package. This
is particularly useful if you want to create plots that do not fit
directly in the lattice framework. We have done this for the vcd
package which contains some vignettes describing the ideas that went
into the creation of the package.
2. My experience from writing vcd is that - as long as you don't do very
sophisticated graphics that reach very deep into the base graphics or
grid routines - it is not very much work to write a core computation
engine and then render the plot either in base or grid. For example,
I've written spineplot() first in base graphics and then easily
transferred it to a grid implementation in spine() in vcd.
Just my EUR 0.02 and my experience...
Z
"Valentin" == Valentin Todorov <valentin.todorov at chello.at>
on Fri, 30 Dec 2005 11:48:25 +0100 writes:
Valentin> Dear RSR-List,
Valentin> I wish you all the best in the new 2006.
and so do I.
Valentin> May be all of you are enjoying the holidays, but I want to ask (a stupid?)
not at all stupid; rather very interesting, and very good to ask early..
Valentin> question about graphics - what type of graphics shall we prefer for
Valentin> implementing the plot functions of a package, the traditional graphics or
Valentin> the lattice package:
Valentin> a) only traditional
I'm slightly modifying -- as Achim "kind of mentioned" -- your
subsequent points to include "grid".
Note that lattice is written, using grid graphics, so the two
belong together.
b) only grid & lattice
c) mixed (some of the plots are implemented using traditional graphics and
others using the grid and/or lattice package)
d) both - all plots in the package are implemented as traditional as well as
grid or lattice plots and the user has an option to choose one of them.
I'd be pragmatic and go for "c)" , at least for now,
with the following reasoning / arguments:
grid and lattice (which is based on grid) are clearly superior
to traditional graphics in their design and possibilities {for 'grid'}.
However, for some S / R programmers, the effort to work with
lattice or grid may be considerably higher than traditional
graphics. Now since 'robustbase' is really first about
computation (and inference) with graphics only an important
tool, I would hesitate to *require* that people provide plot()
methods via grid/lattice.
If novel plots are invented, they really ``should'' be done via
grid graphics {or maybe via lattice panel functions using grid
graphics}; but I'm not sure we will see such novel plots...
OTOH, for really simple plots {lines, points,.. in only one
'panel'}, the use of lattice/grid does not add much value (I
think) and actually may make it even make it considerably harder
for "traditionalists" to add to such a graphic.
It could be interesting to hear many more opinions...
Martin
If novel plots are invented, they really ``should'' be done via
grid graphics {or maybe via lattice panel functions using grid
graphics}; but I'm not sure we will see such novel plots...
OTOH, for really simple plots {lines, points,.. in only one
'panel'}, the use of lattice/grid does not add much value (I
think) and actually may make it even make it considerably harder
for "traditionalists" to add to such a graphic.
Yes, good point. To expand a little: Single panel plots that are
essentially of a scatter plot type, the base framework is usually
sufficient and easier to use. If, in addition, a grid implementation is
needed, it's usually easy to add. For multiple panel plots, grid/lattice
will probably provide better tools.
The advantage of having the single panel plots available in grid and not
only in base graphics, is that they can be easily re-used in multi-panel
plots. So personally, I usually start to write the single panel prototype
in base graphics, then transform it to grid and possibly re-use it in
other functions.
Z