Skip to content

Least square minimization (non-linear)

1 message · Liaw, Andy

#
I think the question is fairly clear (to me, at least).  My problem is
`Why?'

If I'm not mistaken, what Choudary is asked to do is fit a gaussian density
to the data, by fitting the gaussian pdf to the (x, y) data where x are the
midpoints of the bins and y are the heights of the histogram, via nonlinear
least squares.  The fitted distribution is, of course, guaranteed to be a
real density, as it's a gaussian pdf with parameters estimated from NLS.

What Choudary (and his colleague) may not realize is that that's about as
convoluted a way of estimating the  parameters as one can imagine (or
perhaps beyond imagination?).  I do not see any advantage of doing things
this way over just estimating the parameters by the sample mean and variance
(or perhaps the MLE).  At least the statistical properties are well known
(and optimal in certain sense).  

If one is going to fit a gaussian distribution, just do it directly.
There's no need to go half way around the world to do that.  If you are
going to use the histogram, how do you decide on how many bins to use, and
where the boundaries of the bins should be?  Even with a fix number of bins
and bin width, there's not a unique histogram for a set of data.  Which one
should you use?  How do you justify these choices?

If the goal is _not_ to fit a gaussian distribution to the data, then please
do explain what it is.  If by `plotting experimental values vs. theoretical
values' you are trying to assess the normality of the data, then the Q-Q
plot (qqnorm() as Spencer suggested) is a far better choice.

Andy