[RsR] estimators based on random samples... - should be random
Hello, I agree on the convention of R, i.e. to use the R random number generators and the set.seed() function to set the seed. Do you have any suggestions on how to implement/handling these functions from C and fortran; is there examples or documentation to take a look? (At least, for me, this is one of the reason I still use generators from outside R) As Matias point out, at the end of this email, methods may have more than one solution. In wle package I implement some plots to contrast different solutions. An example is in wle.lm (example(wle.lm)). Claudio
On Mon, 1 May 2006, Matias Salibian-Barrera wrote:
Hello, Thanks Martin for (once again!) taking the lead in sparking a discussion. My comments are inserted below.
In R, we have always adhered to the convention, that such estimators should use R's random number generators (=: RNGs) and hence their result will be a function of the initial random seed -- .Random.seed in S and R, typically set via set.seed().
A good convention, IMHO.
The current algorithm implmentations in 'robustbase' however do not adhere to the convention, but rather use an own (cheap) RNG [covMcd(), ltsReg()] or the RNG provided by the operating system C library rand() function [lmrob()] --- and in all these cases, always use the same random seed, by default.
I believe this (each algorithm using its own or the operating system's RNG) is merely due to the "atomized" nature of the development of the separate pieces of code that are now in robustbase, and does not reflect an "a priori design criteria".
Of course, this has the advantage that all your students get the same estimates for the same data (well, at least on the same computer hardware and software combination), but I think we should switch to using R's RNGs and have all these results properly depend on the current random seed, i.e. typically only give the same results after the set.seed(<n>) call.
Probably the most noticeable effect of this change would be that in some cases consecutive calls to fit the same model on the same data may yield different results, and high levels of anxiety on the "uninitiated" user will surely follow... I guess if the convergence criteria of these algorithms is sufficiently tight then this will typically happen only on those cases where the existence of two (or more) solutions is actually informative (and probably relevant for the analysis). Maybe somebody has had other experiences? I second Martin's suggestion, but add that we accompany this change with good examples (one for each model?) on the documentation illustrating how different solutions can yield more insight on the analysis. Matias --
______________________________________________________________ Matias Salibian-Barrera - Department of Statistics University of British Columbia - matias at stat.ubc.ca Phone: (604) 822-3410 - Fax: (604) 822-6960 _______________________________________________ R-SIG-Robust at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust
-------------------------------------------------------------- Claudio Agostinelli Dipartimento di Statistica Universita' Ca' Foscari di Venezia San Giobbe, Cannaregio 873 30121 Venezia Tel: 041 2347446, Fax: 041 2347444 email: claudio at unive.it, www: www.dst.unive.it/~claudio -------------------------------------------------------------- Per favore non mandatemi allegati in Word o PowerPoint. Si veda http://www.gnu.org/philosophy/no-word-attachments.html Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html