Standardising and transformation of explanatory/independent/predictor variables for multiple regression analysis

Dear Sam,

I hear your concern and I sympathise.  The reason for the conflicting advice, in my opinion, is partly historical and partly due to academic 
heredity.  When people first started doing statistical analyses, they didn't have computers and all calculations had to be done by hand.  This, 
coupled with a statistical theory in its infancy, limited the choice of analysis methods.  The result was the pragmatic approach of 
altering-your-data-to-fit-the-method.  There still is, of course, some good reasons to do this, but only sometimes.

Now to answer your questions.  Standardisation of covariates doesn't have inferential benefits.  That is the model you fit will still be the same 
irrespectively.  If you transform your covariates (by a non-linear transformation) then the model will change.  The reason for standardising is to 
avoid computational issues (like numerical underflow and overflow) and some believe it helps to place priors on in a Bayesian analysis.  The reason 
for transforming is quite different.  It is done when you believe that the scale of the covariate is different to that measured.  When fitting smooths 
(GAM(M)s) then the scale shouldn't matter so much anyway, but there still will be some dependence through the location of knots and the distance 
between points in covariate space.

Observations with outlying covariates are likely to have high leverage (they have an excessive amount of influence on the analysis result).  Some 
would argue that you should transform these covariates to account for them.  I would only transform if I thought the scale was wrong, or there were 
other (larger) issues with the data/analysis.  In preference, I would try to do an analysis that reduced the influence of these covariate values.  The 
extreme case is to remove that observation altogether (assume that the observation actually comes from a different sampling frame than you are 
interested in).  A less extreme approach would be to down-weight the observation, or use bootstrap, or resistant/robust methods. These are just 
suggestions that I'm not overly familiar with.  I have used them before but I need to look them up each time).

I hope that this helps,

Scott

Standardising and transformation of explanatory/independent/predictor variables for multiple regression analysis

Thread (6 messages)