[RsR] Maximum number of variables with lmRob? Error: singular matrix encountered
Dear Fabricio Great that it works with lmrob :-). Are the differences really that "substantial"? Even when taking the standard error into account? Of course, you would expect "some" differences, because the two methods do not use the exact same algorithms and also use different default configuration parameters: - For datasets with categorical and continuous variables, lmRob uses an M-S-estimate as initial estimate (replacing the S-estimate). It is basically an L1-estimate for the categorical part and an S-estimate for the continuous part. lmrob does not need to split the estimation into categorical and continuous parts, it just computes an S-estimate. That's one source of differences. - The defaults for the psi- or rho-functions used are different. If I remember correctly, lmRob uses the so-called "optimal" while lmrob uses the bisquare psi-function. While "optimal" is clearly worse, both are prone to produce somewhat unstable estimates for small datasets (if you draw sensitivity curves of the initial estimates you will get non-smooth curves with many jumps, etc). That's why we advocate to use psi-functions that redescend more slowly, like "lqq" (only available for lmrob). That's another source of differences. - lmrob uses randomized algorithms and thus produces slightly different results for different seeds. BTW: Have you considered using the config option setting="KS2011" in lmrob? This is an alternative set of default options using the "lqq" psi-function and SMDM-estimates instead of MM-estimates. This will give you better tests, especially if you have many predictors and a not so large dataset. Best regards, Manuel On Wed, Aug 29, 2012 at 9:37 PM, Fabricio Vasselai
<fabriciovasselai at gmail.com> wrote:
Dears, Thank you very much for your replies so far. First, about my dataset, I am not allowed to publish it here yet. But I am arranging to simulate something similar to show you a data-baase example of the problem. Manuel: you just gave me a good idea. And indeed, the problem does not happen when I use lmrob! Awesome. The only problem is: when I run smaller models, with 10 variables for instance, so lmRob does work for me, then the results from lmRob and from lmrob are substantialy different. Sorry for the silly question, but I would be very interesting in understanding why those two versions of the package could give such a difference. It can be very interesting to asnwer many theoretical qusetions. S.Ellison: very right, it would make sense. But the problem still happens with only one interaction, unfortunattely. Thanks a lot right now for the insights. FABRICIO
Manuel Koller <koller at stat.math.ethz.ch> Seminar f?r Statistik, HG G 18, R?mistrasse 101 ETH Z?rich 8092 Z?rich SWITZERLAND phone: +41 44 632-4673 fax: ...-1228 http://stat.ethz.ch/people/kollerma/