Distribution Fitting - R-SIG-Finance

Sun, Apr 23, 2006 3:10 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20060424/5ee8e8a2/attachment.pl

Dirk Eddelbuettel

Sun, Apr 23, 2006 3:40 PM #

On 24 April 2006 at 00:10, Lorenzo Isella wrote:

|  Dear All,
| I am experiencing some problems in fitting a trimodal distribution (which
| should be thought as a sum of three Gaussian distributions) using the nls
| function for nonlinear fittings.
| As a test, just consider the very simple code:
| 
| rm(list=ls());
| mydata<-rnorm(10000,0,4);
| mydens<-density(mydata,kernel="gaussian");
| y1<-mydens$y;
| x1<-mydens$x;
| myfit<-nls(y1~A*exp(-x1^2/sig));
| 
| 
| which I use to get the empirical density (as I would from real experimental
| data) and test it against a Gaussian ansatz.
| Well, either R always crashes for a segmentation fault and I have to restart
| it manually or I get this output:
| 
| Error in match.call(definition, call, expand.dots) :
| '.Primitiv???i???d???????????????????????????????????????...' is not a function
| 
| Am I missing the obvious or is there some bug in my R build?

Are you aware of fitdistr() in MASS?

mean                  sd            
  -0.0055755191185632    3.9956609772436904 
 ( 0.0399566097724369) ( 0.0282535897233148)

fitdistr() fits a bunch of distribution. I can't recall whether I used this,
or one of the mixed distribution packages from CRAN when I was doing some
work on mixtures for fat tails.

Dirk

Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison

Krishna Kumar

Sun, Apr 23, 2006 8:50 PM #

mixdist is what you want it works very well and gets you nice gaussian 
mixtures.

The issue is with calibration of these mixtures and I would like to be 
shown otherwise..or just wrong!

But if you have three distribution there could be two different 
combination of distribution parameters that can fit the data equally well.
This becomes a problem of how to divvy up say the variance between two 
of the gaussians..and how good an optimizer you have..

Krishna

Dirk Eddelbuettel wrote:

Martin Maechler

Mon, Apr 24, 2006 6:08 AM #

Lorenzo> Dear All,
    Lorenzo> I am experiencing some problems in fitting a trimodal distribution (which
    Lorenzo> should be thought as a sum of three Gaussian distributions) using the nls
    Lorenzo> function for nonlinear fittings.
    Lorenzo> As a test, just consider the very simple code:

    Lorenzo> rm(list=ls());
    Lorenzo> mydata<-rnorm(10000,0,4);
    Lorenzo> mydens<-density(mydata,kernel="gaussian");
    Lorenzo> y1<-mydens$y;
    Lorenzo> x1<-mydens$x;
    Lorenzo> myfit<-nls(y1~A*exp(-x1^2/sig));

(it's slightly inefficient and completely unnecessary to append
 an empty statement to every line -- which you 
 do incidentally if you add superfluous ';' at end of lines )


    Lorenzo> which I use to get the empirical density (as I would from real experimental
    Lorenzo> data) and test it against a Gaussian ansatz.
    Lorenzo> Well, either R always crashes for a segmentation fault and I have to restart
    Lorenzo> it manually or I get this output:

    Lorenzo> Error in match.call(definition, call, expand.dots) :
    Lorenzo> '.Primitiv???i???d???????????????????????????????????????...' is not a function

I cannot confirm any segmentation fault (and if you really get
them, indeed your R *installation* is buggy (or outdated)), but
indeed, you've revealed a buglet in nls() - which is still
present in R 2.3.0 which has been released a few hours ago.
Thank you for reporting it as a reproducible example!

I'm going to try to fix the bug.

    Lorenzo> Am I missing the obvious or is there some bug in my R build?

As others have said (implicitly): you're missing the facts

-  that it's rather bad idea to fit "data" to a density estimate.
-  if your density has a closed from, you should rather use
   maximum likelihood, typically most easily  via MASS::fitdistr

Martin

Lorenzo Isella

Mon, Apr 24, 2006 8:19 AM #

On Mon, 2006-04-24 at 15:08 +0200, Martin Maechler wrote:

Thanks everybody for their advice.
Granted that I had better use fitdistr for my particular problem, how do
I feed a particular density function of my own choice into it?
E.g., consider the case of Gaussian-distributed data and suppose there
is no "normal" argument for the fitdistr().

mysample<-rnorm(10000,2,4)

You want to model the data as drawn from the probability distribution
A*exp(-(x-mu)^2/sig), where x is an experimental data and have A, mu and
sigma as parameters to fit.
I read the help(fitdistr) page, but there is not an example with a
user-provided probability density and I did not get very far with my
attempts.
Best regards and thanks in advance for your help

Lorenzo