Box-Cox Transformation: Drastic differences when varying added constants

Mon, May 17, 2010 9:23 AM

Hi Holger,
I would also highly recommend you look at the ?boxcox and ?logtrans
functions in the MASS package. There is also a very illuminating,
concise discussion about their use on Pages 170 - 172 of

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics
with S. Fourth edition.

with example.

Hope that helps,
Bill

On Sun, May 16, 2010 at 13:01, Peter Ehlers <ehlers at ucalgary.ca> wrote:

On 2010-05-16 6:22, Holger Steinmetz wrote:

Dear experts,

I tried to learn about Box-Cox-transformation but found the following
thing:

When I had to add a constant to make all values of the original variable
positive, I found that
the lambda estimates (box.cox.powers-function) differed dramatically
depending on the specific constant chosen.

Let's say that x is such that 1/x has a Normal distribution,
i.e. lambda = -1.
Then y = (1/x) + b also has a Normal distribution.
But you're expecting 1/(x+b) to also have a Normal distribution.

In addition, the correlation between the transformed variable and the
original were not 1 (as I think it should be to use the transformed
variable
meaningfully) but much lower.

Again, your expectation is faulty. The relationship between the
original and transformed variables is not linear (otherwise,
why do the transformation?), but cor() computes the Pearson
correlation coefficient by default. Try method='spearman'.
Better yet, plot the transformed variables vs the original
variable for further enlightenment.

?-Peter Ehlers

With higher added values (and a right skewed variable) the lambda estimate
was even negative and the correlation between the transformed variable and
the original varible was -.91!!?

I guess that is something fundmental missing in my current thinking about
box-cox...

Best,
Holger


P.S. Here is what i did:

# Creating of a skewed variable X (mixture of two normals)
x1 = rnorm(120,0,.5)
x2 = rnorm(40,2.5,2)
X = c(x1,x2)

# Adding a small constant
Xnew1 = X +abs(min(X))+ .1
box.cox.powers(Xnew1)
Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)

# Adding a larger constant
Xnew2 = X +abs(min(X)) + 1
box.cox.powers(Xnew2)
Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)

#Plotting it all
par(mfrow=c(3,2))
hist(X)
qqnorm(X)
qqline(X,lty=2)
hist(Xtrans1)
qqnorm(Xtrans1)
qqline(Xtrans1,lty=2)
hist(Xtrans2)
qqnorm(Xtrans2)
qqline(Xtrans2,lty=2)

#correlation among original and transformed variables
round(cor(cbind(X,Xtrans1,Xtrans2)),2)

--

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Box-Cox Transformation: Drastic differences when varying added constants

Thread (5 messages)