For a regression model, least-squares is in various senses optimal when
the errors are i.i.d. and normal, but it is a reasonable procedure for
many other situations (but not for modestly long-tailed distributions,
the point of robust statistics).
Although values from -Inf to +Inf are theoretically possible for a normal,
it has very little mass in the tails and is often used as a model for
non-negative quantities (and e.g. the justification of Box-Cox estimation
relies on this).
On Wed, 5 Mar 2008, Martin Elff wrote:
On Wednesday 05 March 2008 (14:53:27), Wolfgang Waser wrote:
Dear all,
I did a non-linear least square model fit
y ~ a * x^b
(a) > nls(y ~ a * x^b, start=list(a=1,b=1))
to obtain the coefficients a & b.
I did the same with the linearized formula, including a linear model
log(y) ~ log(a) + b * log(x)
(b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1))
(c) > lm(log10(y) ~ log10(x))
I expected coefficient b to be identical for all three cases. Hoever,
using my dataset, coefficient b was:
(a) 0.912
(b) 0.9794
(c) 0.9794
Coefficient a also varied between option (a) and (b), 107.2 and 94.7,
respectively.
Models (a) and (b) entail different distributions of the dependent
variable y and different ranges of values that y may take.
(a) implies that y has, conditionally on x, a normal distribution and
has a range of feasible values from -Inf to +Inf.
(b) and (c) imply that log(y) has a normal distribution, that is,
y has a log-normal distribution and can take values from zero to +Inf.
Is this supposed to happen?
Given the above considerations, different results with respect to the
intercept are definitely to be expected.
Which is the correct coefficient b?
That depends - is y strictly non-negative or not ...
Just my 20 cents...