Fitdistr and mle - R-help | R Mailing Lists

Mon, Dec 23, 2013 11:06 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131223/dffe6861/attachment.pl>

Ben Bolker

Tue, Dec 24, 2013 8:00 AM #

Tia Borrelli <tiaborrelli <at> yahoo.it> writes:

problem with the fitting of the distribution.

Hard to say without a reproducible example.  In the example below
the answers are not identical (different starting values etc.) but
they're closer than in your example.

  (I assume that what you're really doing is more complicated than
the trivial example shown here, since the MLEs of the Normal distribution
parameters are very easy ...)

set.seed(101)
ret <- rnorm(10000,mean=-1.5e-5,sd=1.69e-2)
MASS::fitdistr(ret,densfun="normal")
##        mean            sd     
##   7.419639e-05   1.678380e-02 
##  (1.678380e-04) (1.186794e-04)

library(stats4)
loglink <- function(media=0, devstd=1){
  -sum(dnorm(ret, mean=media, sd=devstd, log=TRUE))
}
mle(loglink)
##        media       devstd 
## 7.402637e-05 1.680457e-02

Tia Borrelli

Tue, Dec 24, 2013 1:27 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131224/836ef727/attachment.pl>

Ben Bolker

Wed, Dec 25, 2013 8:46 AM #

Tia Borrelli <tiaborrelli <at> yahoo.it> writes:

OK, but this still isn't a *reproducible* example (see e.g.
http://tinyurl.com/reproducible-000 )

In your example fitdistr() and mle() are doing the same thing under
the hood, i.e.  using the built-in optim() function to minimize a
negative log-likelihood function based on the built-in dnorm().
fitdistr() picks the distribution for you based on your specification
of which distribution to use; mle() requires you to specify the
negative log-likelihood function (the mle2() function in the bbmle
package is an extension of stats4::mle that offers a middle ground,
e.g. you can say y ~ dnorm(mu,sigma) to specify the fit of a Normal
distribution).  The differences between the results you get will be
based on small numerical differences, e.g. the starting values of the
parameters, or differences in the control parameters for optimization.
In general you should get very similar, but not necessarily identical,
answers from these two functions; big differences would probably
indicate some kind of wonky data or numerical problem.  Again, we
would need a reproducible example to see precisely what is going on.