Skip to content
Prev 150428 / 398498 Next

Discretize continous variables....

milicic.marko wrote:
Thanks for your note.  Categorizing age will adversely affect the 
scorecard.  First, since you are introducing discontinuities into the 
prediction model, people can game the system to exploit the 
discontinuity.  Second, lost information from age will have to be made 
up by adding another variable to the model that you might not have 
needed had the full age variable been adjusted for.  Third, if you chop 
age into enough intervals to preserve the predictive value (hard to do 
especially in the outer age ranges where sample sizes do not permit 
cutting but where the age effect is sharp) you will find that the mean 
squared error of predicted values is higher than if you treated age as a 
continuous variable and just forced its effect to be smooth (e.g., using 
a regression spline).

Frank