Skip to content

request for comments --- package "distr" --- S4 Classes for Distributions

10 messages · Warnes, Gregory R, Peter Dalgaard, Matthias Kohl +3 more

#
Great!
You may find some value in looking at the Java distribution library that I
created a couple of years ago by porting the R distribution functions:

http://statdistlib.sourceforge.net/

A separate java implementation of the basic distribution classes is included
as part of the Hydra package for MCMC, 

http://hydra-mcmc.sf.net
I would recommend giving more descriptive names to the methods.  In
particular would recommend 'pdf', 'cdf', 'quantile', and 'random' instead of
simply 'p', 'd', 'q', 'r' so that the expressions are very clear.
It is not at all clear to me what an 'AbscontDistribution' is.  Perhaps you
are referring to a continuous distribution?

You may also want to consider how to deal with multivariate distributions.
Yes this is very interesting, particularly if you extend it to handle
multivariate distributions.  

Good luck!

-Greg


LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
#
"Warnes, Gregory R" <gregory_r_warnes@groton.pfizer.com> writes:
Absolutely continuous. This is slightly more restrictive than just
having a continuous distribution function, effectively meaning that
the density can be defined (with respect to Lebesgue measure on some
interval, usually).

Counterexamples are pathological, the sort of thing you challenge 2nd
year math/stat students to think up, but I can't say that I can
remember what they'd look like.
#
Peter Dalgaard schrieb:
I think the most common example is the Cantor distribution.
See for example:

http://www.sciencedaily.com/encyclopedia/Cantor_function
#
On Tue, 03 Feb 2004 09:45:52 +0000, Matthias Kohl
<Matthias.Kohl@uni-bayreuth.de> wrote:

            
That's the most common 1-dimensional singular distribution, but higher
dimensional distributions are much more commonly singular.  For
example, mixed continuous-discrete distributions, and other
distributions whose support is of lower dimension than the sample
space, e.g. X ~ N(0,1), Y=X.

Duncan Murdoch
#
Duncan Murdoch <dmurdoch@pair.com> writes:
I don't think that qualifies as continuous, does it? Not in the sense
that the distribution function is continuous, surely. 

The Cantor distribution is the one that has the "devils staircase" as
distribution function, right? Continuous, differentiable almost
everywhere but the derivative is always 0. (Take an interval, divide
in three, let F(0) = 0, F(1) = 1, F(x)=.5 on the middle third, and
define the outer thirds recursively.)
#
On Tue, 3 Feb 2004, Duncan Murdoch wrote:

            
The most common 1d singular distribution is probably a lifetime with an
atom at zero.

I think the question was about a continuous but not absolutely continuous
distribution, and indeed the Cantor distribution is the standard example
in theory courses.
#
On 03 Feb 2004 13:21:24 +0100, Peter Dalgaard
<p.dalgaard@biostat.ku.dk> wrote :
Yes, for my second example the 2-d distribution function is
continuous, because there are no atoms:

F(x,y) = P(X <= x, Y <= y) = Phi(min(x,y))

I was wrong about the mixed case; sorry.

Duncan
#
On Tue, 3 Feb 2004 12:31:10 +0000 (GMT Standard Time), Prof Brian D
Ripley <ripley@stats.ox.ac.uk> wrote :
We differ in notation.  I wouldn't call that one singular; I'd call it
mixed continuous and discrete, because the distribution function is a
sum of an absolutely continuous function and a step function.  But in
the measure theory sense, it's singular w.r.t. Lebesgue measure.

Duncan
#
Hi Gregory,
Thank you.
We have looked up these references with interest;

on the one hand, JAVA seems even more appropriate to our approach
with its concept of private/public interfaces, and on first glance, the 
way we
attach functions as slots to objects seems to be more-JAVA/C++ -like
than conformal to the S4 concept,

but we wanted to stay  *within* R to facilitate the use for the common 
R-user
and to have available the full power of  S in syntax and R as to computing.

We have inveseted some time into the decision how to implement the 
methods that
we call "constitutive", i.e. r,d,p,q --- as slots as we did it or as 
methods in the common
S4-concept. The main reason for our decision was:

Our classes are sort of "closed" under *(almost) arbitrary* 
transformations ---
the result of a transformation being again *one* new distribtion that  
only has be created
*once* for each transformation.

In particular, once the slots r,d,p,q is filled by an assignement of the
kind Z= X+Y [or any other transformation of distributions implemented],
in our solution, you can call d(Z)(x) which is the accessor function for 
slot d
of Z returning a function which is then evaluated at x [or r,p,q]
arbitrarily often for different arguments x without redefining d.

Within the S4-concept, however,  as far as we understand it, you would 
probably
have to create a new class for each sort of transformation, e.g.
"ConvolutedDistribution",  and  corresponding methods r,d,p,q for this class
which would then be called by the method dispatcher.
But
+ either an object Z of class "ConvolutedDistribution" would not know 
how to produce
r,d,p,q a priori, and for any call d(Z,x) convolution would have to be 
redone, which would not
be effective,
+ or any object Z which results from the assignement Z=X+Y would have to
produce a new class [and corresponding derived methods....]!

This is what we meant when calling r,d,p,q "constitutive" for a 
distribution in
our manual.
we are open in this issue; if the audience prefers longer names, that is 
oK for us....
we chose the short ones in order to allude to the common naming in R.
This issue has already been dealt with in another thread of postings....
We might, but actually we doubt to be the right ones to do so...
[At least we would like to call in some *real* experts in this domain.]
Anyway, our class concept is open and we have already thought of such an 
extension
Would you like to give us advice in this direction?
You are definitely welcome to :-)

Thank you for your interest,
Peter, Matthias, Thomas, Florian
#
Duncan Murdoch <dmurdoch@pair.com> writes:
Right, sorry. I had it mixed up with a mental image of the conditional
distributions of Y given X or vice versa, which of course jump
discontinuously from 0 to 1.