An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20060424/5ee8e8a2/attachment.pl
Distribution Fitting
5 messages · Dirk Eddelbuettel, Krishna Kumar, Martin Maechler +1 more
On 24 April 2006 at 00:10, Lorenzo Isella wrote:
| Dear All, | I am experiencing some problems in fitting a trimodal distribution (which | should be thought as a sum of three Gaussian distributions) using the nls | function for nonlinear fittings. | As a test, just consider the very simple code: | | rm(list=ls()); | mydata<-rnorm(10000,0,4); | mydens<-density(mydata,kernel="gaussian"); | y1<-mydens$y; | x1<-mydens$x; | myfit<-nls(y1~A*exp(-x1^2/sig)); | | | which I use to get the empirical density (as I would from real experimental | data) and test it against a Gaussian ansatz. | Well, either R always crashes for a segmentation fault and I have to restart | it manually or I get this output: | | Error in match.call(definition, call, expand.dots) : | '.Primitiv???i???d???????????????????????????????????????...' is not a function | | Am I missing the obvious or is there some bug in my R build? Are you aware of fitdistr() in MASS?
library(MASS) mydata<-rnorm(10000,0,4) fitdistr(mydata, "normal")
mean sd -0.0055755191185632 3.9956609772436904 ( 0.0399566097724369) ( 0.0282535897233148)
fitdistr() fits a bunch of distribution. I can't recall whether I used this, or one of the mixed distribution packages from CRAN when I was doing some work on mixtures for fat tails. Dirk
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison
mixdist is what you want it works very well and gets you nice gaussian mixtures. The issue is with calibration of these mixtures and I would like to be shown otherwise..or just wrong! But if you have three distribution there could be two different combination of distribution parameters that can fit the data equally well. This becomes a problem of how to divvy up say the variance between two of the gaussians..and how good an optimizer you have.. Krishna
Dirk Eddelbuettel wrote:
On 24 April 2006 at 00:10, Lorenzo Isella wrote: | Dear All, | I am experiencing some problems in fitting a trimodal distribution (which | should be thought as a sum of three Gaussian distributions) using the nls | function for nonlinear fittings. | As a test, just consider the very simple code: | | rm(list=ls()); | mydata<-rnorm(10000,0,4); | mydens<-density(mydata,kernel="gaussian"); | y1<-mydens$y; | x1<-mydens$x; | myfit<-nls(y1~A*exp(-x1^2/sig)); | | | which I use to get the empirical density (as I would from real experimental | data) and test it against a Gaussian ansatz. | Well, either R always crashes for a segmentation fault and I have to restart | it manually or I get this output: | | Error in match.call(definition, call, expand.dots) : | '.Primitiv???i???d???????????????????????????????????????...' is not a function | | Am I missing the obvious or is there some bug in my R build? Are you aware of fitdistr() in MASS?
library(MASS) mydata<-rnorm(10000,0,4) fitdistr(mydata, "normal")
mean sd -0.0055755191185632 3.9956609772436904 ( 0.0399566097724369) ( 0.0282535897233148) fitdistr() fits a bunch of distribution. I can't recall whether I used this, or one of the mixed distribution packages from CRAN when I was doing some work on mixtures for fat tails. Dirk
"Lorenzo" == Lorenzo Isella <lorenzo.isella at gmail.com>
on Mon, 24 Apr 2006 00:10:41 +0200 writes:
Lorenzo> Dear All,
Lorenzo> I am experiencing some problems in fitting a trimodal distribution (which
Lorenzo> should be thought as a sum of three Gaussian distributions) using the nls
Lorenzo> function for nonlinear fittings.
Lorenzo> As a test, just consider the very simple code:
Lorenzo> rm(list=ls());
Lorenzo> mydata<-rnorm(10000,0,4);
Lorenzo> mydens<-density(mydata,kernel="gaussian");
Lorenzo> y1<-mydens$y;
Lorenzo> x1<-mydens$x;
Lorenzo> myfit<-nls(y1~A*exp(-x1^2/sig));
(it's slightly inefficient and completely unnecessary to append
an empty statement to every line -- which you
do incidentally if you add superfluous ';' at end of lines )
Lorenzo> which I use to get the empirical density (as I would from real experimental
Lorenzo> data) and test it against a Gaussian ansatz.
Lorenzo> Well, either R always crashes for a segmentation fault and I have to restart
Lorenzo> it manually or I get this output:
Lorenzo> Error in match.call(definition, call, expand.dots) :
Lorenzo> '.Primitiv???i???d???????????????????????????????????????...' is not a function
I cannot confirm any segmentation fault (and if you really get
them, indeed your R *installation* is buggy (or outdated)), but
indeed, you've revealed a buglet in nls() - which is still
present in R 2.3.0 which has been released a few hours ago.
Thank you for reporting it as a reproducible example!
I'm going to try to fix the bug.
Lorenzo> Am I missing the obvious or is there some bug in my R build?
As others have said (implicitly): you're missing the facts
- that it's rather bad idea to fit "data" to a density estimate.
- if your density has a closed from, you should rather use
maximum likelihood, typically most easily via MASS::fitdistr
Martin
On Mon, 2006-04-24 at 15:08 +0200, Martin Maechler wrote:
"Lorenzo" == Lorenzo Isella <lorenzo.isella at gmail.com>
on Mon, 24 Apr 2006 00:10:41 +0200 writes:
Lorenzo> Dear All,
Lorenzo> I am experiencing some problems in fitting a trimodal distribution (which
Lorenzo> should be thought as a sum of three Gaussian distributions) using the nls
Lorenzo> function for nonlinear fittings.
Lorenzo> As a test, just consider the very simple code:
Lorenzo> rm(list=ls());
Lorenzo> mydata<-rnorm(10000,0,4);
Lorenzo> mydens<-density(mydata,kernel="gaussian");
Lorenzo> y1<-mydens$y;
Lorenzo> x1<-mydens$x;
Lorenzo> myfit<-nls(y1~A*exp(-x1^2/sig));
(it's slightly inefficient and completely unnecessary to append
an empty statement to every line -- which you
do incidentally if you add superfluous ';' at end of lines )
Lorenzo> which I use to get the empirical density (as I would from real experimental
Lorenzo> data) and test it against a Gaussian ansatz.
Lorenzo> Well, either R always crashes for a segmentation fault and I have to restart
Lorenzo> it manually or I get this output:
Lorenzo> Error in match.call(definition, call, expand.dots) :
Lorenzo> '.Primitiv?i?d?????????????...' is not a function
I cannot confirm any segmentation fault (and if you really get
them, indeed your R *installation* is buggy (or outdated)), but
indeed, you've revealed a buglet in nls() - which is still
present in R 2.3.0 which has been released a few hours ago.
Thank you for reporting it as a reproducible example!
I'm going to try to fix the bug.
Lorenzo> Am I missing the obvious or is there some bug in my R build?
As others have said (implicitly): you're missing the facts
- that it's rather bad idea to fit "data" to a density estimate.
- if your density has a closed from, you should rather use
maximum likelihood, typically most easily via MASS::fitdistr
Martin
Thanks everybody for their advice. Granted that I had better use fitdistr for my particular problem, how do I feed a particular density function of my own choice into it? E.g., consider the case of Gaussian-distributed data and suppose there is no "normal" argument for the fitdistr(). mysample<-rnorm(10000,2,4) You want to model the data as drawn from the probability distribution A*exp(-(x-mu)^2/sig), where x is an experimental data and have A, mu and sigma as parameters to fit. I read the help(fitdistr) page, but there is not an example with a user-provided probability density and I did not get very far with my attempts. Best regards and thanks in advance for your help Lorenzo