Skip to content

Statistical question re assessing fit of distribution functions.

4 messages · Timur Shtatland, Ted, David Scott

Ted
#
I am in a situation where I have to fit a distrution, such as cauchy or
normal, to an empirical dataset.  Well and good, that is easy.

But I wanted to assess just how good the fit is, using ks.test.

I am concerned about the following note in the docs (about the example
provided):  "Note that the distribution theory is not valid here as we have
estimated the parameters of the normal distribution from the same sample"

This implies I should not use ks.test(x,"pnorm",mean =1.187, sd =0.917),
where the numbers shown are estimated from 'x'.  If this is so, how do I get
a correct test?  I know I can not use different samples because of just how
different the parameters are from one sample to the next, so using
parameters estimated from the sample from week one to define the
distribution function for ks.test will give a poor fit for the data from
week two.  And the sample size is small enough that I would not have
confidence in the parameters estimated from a portion of a samlpe to fit
against the remainder of the sample.

Thanks

Ted
#
If one of the goals is the normality test, then there may be better
alternatives to the Kolmogorov-Smirnov test.
See an explanation on:
http://graphpad.com/FAQ/viewfaq.cfm?faq=959

The R implementation:
?shapiro.test

A casual search also turned this up:
http://tolstoy.newcastle.edu.au/R/help/04/09/3201.html
http://tolstoy.newcastle.edu.au/R/help/04/08/3121.html
http://www.karlin.mff.cuni.cz/~pawlas/2008/MAI061/dagost.R

Best,

Timur
--
Timur Shtatland, Ph.D.
Senior Bioinformatics Scientist
Agencourt Bioscience Corporation - A Beckman Coulter Company
500 Cummings Center, Suite 2450
Beverly, MA 01915
www.agencourt.com
On Mon, Sep 22, 2008 at 12:26 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
Ted
#
Thanks Timur

While assessing whether or not the best option would be a normal
distribution (it won't be, the data in this case LOOKS more poisson, or if I
explude the first week of results, a negative exponential; and in my other
case, cauchy is more likely), I really need a test that can be applied
regardless of the distribution to see which distribution fits best.  Using
log-likelihood, there doesn't seem to be much to choose between exponential
and poisson (the log-likelihhod for them being almost the same, regardless
of the sample even tough the parameters are very different from one sample
to the next - I don't understand why yet), and the others I have tried are
MUCH worse, but I'm not done yet.

Are you aware of functions that allow estimation of all the parameters of a
non-central distribution?  I ask because a problem I'll be working on in a
few weeks will involve the kind of skew produced by a non-central
distribution (among others).  I see some functions allow you to work with
skewed distributions (e.g. "[dpqr]stable  the skewed stable distribution ")
but I have not yet found functions that alow one to estimate their
parameters from real data.

Thanks,

Ted
Timur Shtatland wrote:

  
    
#
On Tue, 23 Sep 2008, Ted Byers wrote:

            
Ted,

You have talked about heavy tailed, skewed distributions. To fit these 
you need to look at some packages. There are a number of possibilities in 
fBasics which is part of Rmetrics, sn is a very nice package for the skew 
normal and skew t distributions, and there are packages for the hyperbolic 
and generalized hyperbolic distributions: HyperbolicDist, ghyp and QRMlib.

You won't find much on goodness of fit tests I think. I have an 
implementation of the Cramer-von Mises test for the hyperbolic in my 
package (HyperbolicDist) but I am not aware of a lot else being available.

David Scott

_________________________________________________________________
David Scott	Department of Statistics, Tamaki Campus
 		The University of Auckland, PB 92019
 		Auckland 1142,    NEW ZEALAND
Phone: +64 9 373 7599 ext 86830		Fax: +64 9 373 7000
Email:	d.scott at auckland.ac.nz

Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics