Skip to content
Prev 307900 / 398503 Next

party for prediction [REPOST]

Ed:
I'm not sure what you mean by "integral vector". If you want to apply the 
approach to hundreds of thousands of observations, I gues that these are 
categorical (maybe even binary?) but maybe not...
If I recall correctly, we kept linearModel as simple as we did to save as 
much time as possible. This can be particularly important when one of the 
partitioning variables has many possible splits and the linearModel has to 
be fitted thousands of times.

Also, mob() assesses the stability of all coefficients of the model in all 
nodes during partitioning. If any of the coefficients is not identified, 
this would have to be excluded from all subsequent parameter stability 
tests in that node (and its child nodes). This is currently not provided 
for in mob().
This comes from the parameter stability tests and might be a result of an 
unidentified (or close to unidentified) model fit.
With hundreds of thousands of observations, you would need some additional 
pruning strategy anyway. Significance test-based splitting will probably 
overfit because tiny differences in the coefficients will be picked up at 
such large sample sizes.

Furthermore, computationally the extensive search over all possible splits 
might be too burdensome with this many observations.

Hence, using some subsampling strategy might not be the worst thing.
We have had non-identified model fits in binary GLMs (with quasi-complete 
separation) where we then set estfun() to all zero so that partitioning 
stops. But I don't think that such a strategy helps here.
Not sure, I don't know any off the top off my head.
If your partitioning variables are particularly simple (e.g., all binary) 
you could exploit that and it may be easier to write a custom function for 
your particular data. Then likelihood-ratio tests (rather than LM-type 
tests) would also be easier to apply in case of unidentified parameters.

But if there are partitioning variables with different measurement scales, 
then this will not be that simple...
Have a look at the "Writing R Extensions" manual and the R for Windows 
FAQ.

Best,
Z