Skip to content

[RsR] Minutes Treviso

8 messages · Peter Filzmoser, Matias Salibian-Barrera, Valentin Todorov +2 more

#
Please, have a look at the RSR home page

http://www.statistik.tuwien.ac.at/rsr/

where you can find now minutes from the workshop in
Treviso from *all* working groups!

Thanks a lot to the contributors.

Regards,
Peter
#
Dear List,

A few days ago I uploaded the package "roblm" to CRAN. It implements 
MM-regression estimators and currently has some diagnostic plots as 
well. The documentation needs quite a bit of work, but the main 
information is there.

Note that the name of the package (and the corresponding class) is 
"roblm", but this does not mean that I necessarily prefer this name over 
others. I've been working on this for some time now, and for the reasons 
that I mentioned in Treviso ("rlm" and "lmRob" are already taken) I 
settled for roblm.

I took the liberty to re-organize the regression workgroup minutes 
relating to linear regression in a format closer to a "to-do" list. I 
would very much appreciate your feedback. Names within parenthesis 
indicate "volunteers" for a particular task. Please feel free to correct 
all my mistakes (and to add your name to the list of volunteers!)

Thanks.

Matias

-- An initial (partial) list of things that remain to be done for robust 
linear regression

- expose the "initial estimator" as a separate function.
	This should include the fast-S (already built-in), the
	alternate M-S estimator for the case of many factor
	variables (Victor can provide code), and the heuristic
	initial estimator of Yohai-Pena (Victor has code?)
- explore how to use score ("psi") functions declared in
	R (as S4 objects) in the C code
- decide which options should stay in the "control" function
- add to summary.roblm(), print.roblm() and
	print.summary.roblm() information on which estimation
	method was used (MM, etc)
- write a model selection function (Victor has code for a robust
	backward stepwise method (RFPE?); Elvezio may know of / have
	code for the robust Cp; Eva can provide this part?)
- write an "anova" function using robust F-, Wald tests	(Matias)
- add the robust weights to the returned object (Matias)
- improve / complete roblm documentation (Matias)
- incorporate more data sets (with documentation) (Matias)
#
Matias> Dear List,

    Matias> A few days ago I uploaded the package "roblm" to CRAN. It implements 
    Matias> MM-regression estimators and currently has some diagnostic plots as 
    Matias> well. The documentation needs quite a bit of work, but the main 
    Matias> information is there.


Thank you, Matias!

    Matias> Note that the name of the package (and the corresponding class) is 
    Matias> "roblm", but this does not mean that I necessarily prefer this name over 
    Matias> others. I've been working on this for some time now, and for the reasons 
    Matias> that I mentioned in Treviso ("rlm" and "lmRob" are already taken) I 
    Matias> settled for roblm.

(I'm replying to this in a different e-mail on  
 "function naming ... " )

    Matias> I took the liberty to re-organize the regression
    Matias> workgroup minutes relating to linear regression in a
    Matias> format closer to a "to-do" list. I would very much
    Matias> appreciate your feedback. Names within parenthesis
    Matias> indicate "volunteers" for a particular task. Please
    Matias> feel free to correct all my mistakes (and to add
    Matias> your name to the list of volunteers!)


    Matias> -- An initial (partial) list of things that remain
    Matias>    to be done for robust linear regression
good point (which you've raised on this list earlier, but didn't
get an answer)!
I'm not yet sure what the optimal approach would be here.
There is some overhead calling the interpreted R code from the C
code.  In the mean time I had an idea which might be more
flexible: The R class could have a slot which is (empty or) a
pointer ("externalptr") to a C function if that is available.

Hence for some score/psi/rho/.. functions one, R would call fast
builtin C-code where one still would have the full flexibility
of "playing with" new *-functions..
yes, in my view; I had argued in Treviso that 'anova' shoold be
used as function for comparing nested models also in situations
where it's really not an '[an]alysis [o]f [va]riance' anymore.
I agree.  This applies to almost all robust procedures.
Here we should also try to find a common name
('wts','weights',..), and also 'standardize' I think; either
requiring 
max_i w_i = 1 (natural for  w(r_i) = psi(r_i) / r_i)  or then  
sum_i w_i = 1 (natural for  formulae using the weights)
I've also asked for this a few days ago, and Valentin has
promised to provide quite a few from the Rousseeuw-Leroy book.
Hopefully, those from the MMY book (Maronna-Martin-Yohai) will
also be provided relatively soon.

Martin
#
.........

    Matias> A few days ago I uploaded the package "roblm" to CRAN.
    Matias>  ........

    Matias> Note that the name of the package (and the
    Matias> corresponding class) is "roblm", but this does not
    Matias> mean that I necessarily prefer this name over
    Matias> others. I've been working on this for some time now,
    Matias> and for the reasons that I mentioned in Treviso
    Matias> ("rlm" and "lmRob" are already taken) I settled for
    Matias> roblm.

yes. In Treviso, we had talked about trying to go for "rlm" and
I had offered to try to resolve the name clash with MASS::rlm().
I now think that this may not be a good idea:  rlm()
in MASS is really relatively prominent in the MASS book with
years of tradition, and usage in many places.  Also its default
method is "M", not "MM" { for a good reason: back compatibility
with older versions of rlm()}.
Hence our new  (robust|rob|rf|r)lm(Rob) function would never be
'call-compatible' to MASS::rlm() and for that reason I think we
should strive for a different naming scheme.

Andreas Ruckstuhl had raised the point already at Treviso, and
on this list (Dec 7, "Re: [RsR] OGK covariance estimator") where
he'd proposed
     r*   [as 'rlm' in MASS]
     rob* [as 'roblm' above]
     rf*  ["[R]obust [F]itting of ..", used in Andreas' package]

The last one may be a bit more logical than all the others,
since in one sense there's just one linear model with different
fitting methodologies, since in fact,
the error distribution hasn't been part of the
model specification __for most statistical software__

OTOH, "rob" is easy to pronounce/spell
[OTOH, there's all the people called 'Rob' ...]

As you can guess, from Andreas' list, I'd either take
'rob*' or 'rf*' and don't have strong preference.  
Other opinions?

Martin
7 days later
#
Dear RSR-List,

I wish you all the best in the new 2006.

May be all of you are enjoying the holidays, but I want to ask (a stupid?)
question about graphics - what type of graphics shall we prefer for
implementing the plot functions of a package, the traditional graphics or
the lattice package:

a) only traditional
b) only lattice
c) mixed (some of the plots are implemented using traditional graphics and
others - the lattice package)
d) both - all plots in the package are implemented as traditional as well as
lattice plots and the user has an option to choose one of them.

Best regards,
Valentin
#
On Fri, 30 Dec 2005, Valentin Todorov wrote:

            
Let me say two things about this:
  1. base vs. lattice graphics is not the only choice. You can also write
     graphics using grid directly and not via the lattice package. This
     is particularly useful if you want to create plots that do not fit
     directly in the lattice framework. We have done this for the vcd
     package which contains some vignettes describing the ideas that went
     into the creation of the package.
  2. My experience from writing vcd is that - as long as you don't do very
     sophisticated graphics that reach very deep into the base graphics or
     grid routines - it is not very much work to write a core computation
     engine and then render the plot either in base or grid. For example,
     I've written spineplot() first in base graphics and then easily
     transferred it to a grid implementation in spine() in vcd.
Just my EUR 0.02 and my experience...
Z
#
Valentin> Dear RSR-List,
    Valentin> I wish you all the best in the new 2006.

and so do I.

    Valentin> May be all of you are enjoying the holidays, but I want to ask (a stupid?)

not at all stupid; rather very interesting, and very good to ask early..

    Valentin> question about graphics - what type of graphics shall we prefer for
    Valentin> implementing the plot functions of a package, the traditional graphics or
    Valentin> the lattice package:

    Valentin> a) only traditional

I'm slightly modifying  -- as Achim "kind of mentioned" -- your
subsequent points to include "grid".
Note that lattice is written, using grid graphics, so the two
belong together.

            b) only grid & lattice

            c) mixed (some of the plots are implemented using traditional graphics and
	              others using the grid and/or lattice package)
            d) both - all plots in the package are implemented as traditional as well as
	              grid or lattice plots and the user has an option to choose one of them.

I'd be pragmatic and go for "c)" , at least for now,
with the following reasoning / arguments:

grid and lattice (which is based on grid) are clearly superior
to traditional graphics in their design and possibilities {for 'grid'}.
However, for some S / R programmers, the effort to work with
lattice or grid may be considerably higher than traditional
graphics.  Now since 'robustbase' is really first about
computation (and inference) with graphics only an important
tool, I would hesitate to *require* that people provide plot()
methods via grid/lattice.
If novel plots are invented, they really ``should'' be done via
grid graphics {or maybe via lattice panel functions using grid
graphics};  but I'm not sure we will see such novel plots...
OTOH, for really simple plots {lines, points,.. in only one
'panel'}, the use of lattice/grid does not add much value (I
think) and actually may make it even make it considerably harder
for "traditionalists" to add to such a graphic.

It could be interesting to hear many more opinions...

Martin
#
On Fri, 30 Dec 2005, Martin Maechler wrote:

            
Yes, good point. To expand a little: Single panel plots that are
essentially of a scatter plot type, the base framework is usually
sufficient and easier to use. If, in addition, a grid implementation is
needed, it's usually easy to add. For multiple panel plots, grid/lattice
will probably provide better tools.
The advantage of having the single panel plots available in grid and not
only in base graphics, is that they can be easily re-used in multi-panel
plots. So personally, I usually start to write the single panel prototype
in base graphics, then transform it to grid and possibly re-use it in
other functions.
Z