party for prediction [REPOST]

Sun, Oct 14, 2012 9:29 AM

First up, thanks hugely for your response. I've been beating my head
against this!

On 14 October 2012 16:51, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:

I'm sorry I can't go into the details of the data, I would if I could.
z are categorical variables represented as integers, mostly ordered,
but not all. I've tried fitting them as integers, as well as ordered,
but O don't think it made a huge difference.

I can appreciate that, but maybe having an alternative linearModel
which will predict when the fit is degenerate would be worth
including? I'm happy to contribute what I have, although it's pretty
obvious stuff (and probably done suboptimally since I'm not much of an
R coder at this point). For me at least, even with huge datasets, the
speed of party is quite good; it's getting a better result that's the
problem.

Would pretending the coefficients were fit at 0 fool mob into doing
something moderately meaningful here?

If not, I would try to hack the code, but I'm honestly at something of
a loss as to how to modify it and feed the results back into my
interpreter. I have bytecode installed; I downloaded the source, but I
haven't squared the circle of modifying the source and installing the
result. I will check out the docs on writing extensions you suggest.

This is a great help to know. I improved my results quite considerably
with aggressive scaling of everything (scaling the response and all
the predictors to lie between 0 an 1). That deepened my tree by a
factor of two or so (say depth 3 to 7) and improved the quality of fit
substantially. Is there any way I can engage a more numerically robust
Cholesky in mob?

I'm okay with overfitting, honestly. At the moment it is underfitting
by quite a large amount I think (the quality of the predictions on the
training set is not very high). The problem really is there is so much
going on the data, but the "noise" level is probably very low. I
wouldn't be surprised if my data was accurate to 5 or 6 s.f.

I have plenty of compute time/power and RAM, though R seems to be
running single threaded. But even on a few million observations, it's
still pretty fast and doesn't use more than 30 or so gig of memory. If
it takes a day and requires 150gig of RAM, that is absolutely fine,
even over that would be viable though less optimal.

I've tried this at various degrees, but the data is really very
complicated with not a lot of error. I'm trying to encourage party to
fit more closely, which I thought more data might encourage. At the
moment I'm a long way from a clean fit. I have subsampled at various
levels down to 1%, and although that increases the depth of the tree
and quality of fit, it still doesn't give a very good quality fit and
can encourage it to overlook obvious aspects of the training set.

I've considered using rpart() to partition into cells of constant
gradient, then fitting linear models myself to the cells. This is my
next thought. I'm pretty sure partitioning over linear regression is
the way forward for the data we have. I tried mars and glm but there
are good reasons to think they're less reasonable, even though the fit
wasn't particularly poor. I'm not particularly wedded to party's
approach except that it looked like it immediately returned what we
needed, and with some degree of "optimality" into the bargain.

I guess I really will have to bite the bullet and try to figure out
how to install modified libraries. Thanks.

Unfortunately each partitioning variable is essentially a state
indicator, taking values say 0,...,R where R is different for each
component. I'm not a stats expert either; I've spent some time with
the party manuals and papers, but I wouldn't be confident of
implementing something like it in the time available to me (though if
I have to I will, but that wouldn't be a good situation to be in).

Will do.

Thank you very much for your responses, I really appreciate it.

Best wishes,

Ed

party for prediction [REPOST]

Thread (8 messages)