informal conventions/checklist for new predictive modeling packages

Thu, Jan 5, 2012 12:44 PM

I agree with almost all, except the last point. Since I have
participated in wheel-reinvention lately, I agree with the bulk of
your comment. I don't think the fix is as easy as you suspect,
RSiteSearch won't help me find a function I need when I don't know the
magic words.  Some R functions have such unexpected names that only a
fastidious source-code reader would find them ("pretty", for example).
 But I agree with your concern.

But, as far as the last one is concerned, I think you are mistaken.
Explanation below.

On Wed, Jan 4, 2012 at 8:19 AM, Max Kuhn <mxkuhn at gmail.com> wrote:

When the DV is thought of as 0 and 1, and 1 is an "event" "success" or
"win" and 0 is a "non event" "failure" or "loss",  if there is to be a
single predicted probability, I want it to be the probability of the
higher outcome.

glm is doing the thing I want, and I don't know of others that go the
other way, except PROC LOGISTIC in SAS.  And that has a long history
of causing confusion and despair.

I'd like to consider adding one thing to your list, though.  I have
wished (in this list and elsewhere) that there were a more regular
approach for calculating "newdata" objects that are used in predict.
Many packages have re-invented this (datadist in rms, effects), and
almost nobody here agreed with my wish for a more standard approach.
But if there were a standard approach, it would be much easier to hold
up R as an alternative to Stata when users pop up with "marginal
effects tables" from Stata that are very difficult to reproduce with
R.

Regards,
pj

Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

informal conventions/checklist for new predictive modeling packages

Thread (4 messages)