Skip to content

fitting distributions with R

1 message · Huntsinger, Reid

#
In "optim" you need to set "ndeps" (the "delta x" parameter controlling the
finite-difference approximation) to a sufficiently small value (or supply
the gradient yourself to avoid finite differences, which are messy on a
restricted parameter space.) Since you expect a minimum at about 6.7e-5 the
default ndeps=1e-3 is definitely too large.
$par
[1] 6.76644e-05

$value
[1] 254.4226

$counts
function gradient 
     136       18 

$convergence
[1] 0

$message
NULL

There were 50 or more warnings (use warnings() to see the first 50)
The warnings are "NaNs produced in: log(x)" which can be avoided by making
sure the function doesn't try to take the log of something <= 0, for example
change the last line to 

ifelse(beta > 0, -n*log(beta)+beta*sum(x), Inf)

and then optim is happy.
Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Ted.Harding at nessie.mcc.ac.uk
Sent: Tuesday, September 06, 2005 11:46 AM
To: r-help at stat.math.ethz.ch
Cc: Nadja Riedwyl
Subject: Re: [R] fitting distributions with R
On 06-Sep-05 Huntsinger, Reid wrote:
While that is true (and Naja clearly knew this), nevertheless
one expects that using an optimiser should also work. Nadja's
observations need an explanation.

If things don't behave as expected, it is worth while embedding
"debug prints" so as to monitor what is going on internally (as
fas as one can). In this case, if one modifies Nadja's "ll"
function to

ll<-function(beta){
  n<-24
  x<-data2
  temp<-(-n*log(beta)+beta*sum(x))
  print(temp)
  temp
}

and re-runs 'mle', one sees that while there are some numerical
values in the output, there are many NaNs. Also, given the
warning message and the advice to look at "warnings()", one
learns that "NaNs produced in: log(x)" repeatedly. This very
strongly suggests that attempts have been made to take logs
of negative numbers which in trun suggests that the method
of computing the next approximation readily takes the value
of beta outside the valid range of beta > 0.

Now is the time to look at "?mle", which says that the default
method is "BGFS" for which "see optim". Under "?optim" we learn
that "BGFS" is a "quasi-Newton method". Such methods work by
calculating a local tangent to the derivative function and
extrapolating this until it meets the beta-axis, and this can
easily take the estimate outside admissible ranges (try using
Newton-Raphson to solve sqrt(x) = 0).

However, a related method available for 'optim' is "L-BFGS-B"
"which allows _box constraints_, that is each variable can be
given a lower and/or upper bound. The initial value must satisfy
the constraints." This can be set in a parameter for 'mle'.

So now you can try something like

  est<-mle(minuslog=ll, start=list(beta=0.1),
           method="L-BFGS-B", lower=10*(.Machine$double.eps))

and now the trace-prints show a series of numbers, with no NaNs,
so clearly we are getting somewhere (and have identified and
dealt with at least one aspect of the problem). However, observing
the prints, one sees that after an initial trend to convergence
there is a tendency to oscillate between values in the neighbouhood
of beta=360 and values in the neighbourhood of beta=800, finally
failing when two successive values "360.6573" are printed, which
in turn suggests that an attempt is made to comuted a gradient
from identical points. So clearly there is something not right
about how the method works for this particular problem (which,
as a statistical estimation problem, could hardly be simpler!).

Now, "?optim" has, at the end, a Note to the effect that the
default method (admittedly Nelder-Mead, which is not relevant
to the above) may not work well and suggests using 'optimize'
instead. So let's try 'optimize' anyway.

Now, with

  optimize(ll,lower=10*(.Machine$double.eps),upper=1e10)

we get a clean set of debug-prints, and convergence to

  beta = 5.881105e-05

with "minimum" 'll' equal to 254.6480.

Now compare with the known MLE which is

  beta = 1/mean(data2) = 6.766491e-05

giving

  ll(1/mean(data2)) = 254.4226, 

So clearly, now, using 'optimise' instead of 'optim' which
is what 'mle' uses, we are now "in the right parish". However,
there is apparently no parameter to 'mle' which would enable
us to force it to use 'optimize' rather than 'optim'!

This interesting saga, provoked by Nadja's query, now raises
an important general question: Given the simplicity of the
problem, why is the use of 'mle' so unexpectedly problematic?

While in the case of an exponential distribution (which has a
well-known analytical solution) one would not want to use 'mle'
to find the MLE (except as a test of 'mle'. perhaps), one can
easily think of other distributions, in form and behaviour very
similar to the negative exponential but without analytical solution,
for which use of 'mle' or some other optimisation routine would
be required. Such distributions could well give rise to similar
problems -- or worse: in Nadja's example,it was clear that it was
not working; in other cases, it might appear to give a result,
but the result might be very wrong and this would not be obvious.

Hmmm.

Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Sep-05                                       Time: 16:46:12
------------------------------ XFMail ------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html