Skip to content

determing the distribution of a sample data set etc..

2 messages · Ann Huxtable, Brian Ripley

#
Hello,

I have only recently started using R. I have two data samples that I want to 
carry out some initial explorative data analysis to:

i). Determine the distribution of the data
ii). Determine whether both datasets are from the same distribution.

I have managed to create unit probability histograms and created qqplots for 
the data. I have attached one of the qqplots. It is clear that the data is 
not from a normal distribution (it forms a convex curve underneath the 
straight line).  the nature of the curve suggest the data is from either 
Chi-square or F distribution (if you think otherwise, I would appreciate 
your help in correcting my analysis).

The point of this mail however, is how do I use R to:

1). Test if the data is from another distribution (F, Ch-Square etc.. )
2). How can I check if the samples are drawn from the same distribution?

many thanks in advance for your help.

Ann
#
On Sat, 13 Nov 2004, Ann Huxtable wrote:

            
No plot made it to the list: see the posting guide for what attachments 
are allowed.
I would use qqplots for both purposes.  qqplot will plot one dataset 
against another: see its examples.  It will also plot against another 
distribution: continuing that example

qqplot(y, qt(ppoints(200), df=5))

You could also compare two samples via the ecdfs and the 
Kolmogorov-Smirnov test (examples in the MASS ch05.R script).  But formal 
testing is not much help unless you know what sort of differences are 
interesting _a priori_ -- you would need enormous samples to distinguish 
a t_5 from a t_4, for example.