-----Original Message-----
From: Peter Ruckdeschel [mailto:Peter.Ruckdeschel@uni-bayreuth.de]
Sent: Friday, January 30, 2004 1:12 PM
To: r-devel@stat.math.ethz.ch
Cc: Florian Camphausen; Josef Leydold; Thomas Stabla
Subject: [Rd] request for comments --- package "distr" --- S4 Classes
for Distributions
Hello,
after some discussions with Martin Maechler and Josef Leydold
(WU Wien),
we have felt the need for some package that should allow for an
object-orientated
approach to distributions.
Great!
Our small group at Bayreuth now has developed a package "distr" which
tries to fill this gap, implementing distributions by means of
S4--classes.
You may find some value in looking at the Java distribution library that I
created a couple of years ago by porting the R distribution functions:
http://statdistlib.sourceforge.net/
A separate java implementation of the basic distribution classes is included
as part of the Hydra package for MCMC,
http://hydra-mcmc.sf.net
A mother class "Distribution" is introduced with slots for a
parameter
and -
most important - for the four constitutive methods "r", "d",
"p", and "q"
(alluding to the corresponding naming already used for these
functions
in S).
All distributions of the " base" package for which such "r",
"d", "p",
and "q"
functions exist are implemented (essentially by wrappers of the
origininal code)
I would recommend giving more descriptive names to the methods. In
particular would recommend 'pdf', 'cdf', 'quantile', and 'random' instead of
simply 'p', 'd', 'q', 'r' so that the expressions are very clear.
as subclasses of either of the two the subclasses
"AbscontDistribution" or
" DiscreteDistribution".
It is not at all clear to me what an 'AbscontDistribution' is. Perhaps you
are referring to a continuous distribution?
You may also want to consider how to deal with multivariate distributions.
This approach seems very appealing to us from a conceptual viewpoint:
Yes this is very interesting, particularly if you extend it to handle
multivariate distributions.
Good luck!
-Greg
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
as subclasses of either of the two the subclasses
"AbscontDistribution" or
" DiscreteDistribution".
It is not at all clear to me what an 'AbscontDistribution' is. Perhaps you
are referring to a continuous distribution?
Absolutely continuous. This is slightly more restrictive than just
having a continuous distribution function, effectively meaning that
the density can be defined (with respect to Lebesgue measure on some
interval, usually).
Counterexamples are pathological, the sort of thing you challenge 2nd
year math/stat students to think up, but I can't say that I can
remember what they'd look like.
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
as subclasses of either of the two the subclasses
"AbscontDistribution" or
" DiscreteDistribution".
It is not at all clear to me what an 'AbscontDistribution' is. Perhaps you
are referring to a continuous distribution?
Absolutely continuous. This is slightly more restrictive than just
having a continuous distribution function, effectively meaning that
the density can be defined (with respect to Lebesgue measure on some
interval, usually).
Counterexamples are pathological, the sort of thing you challenge 2nd
year math/stat students to think up, but I can't say that I can
remember what they'd look like.
I think the most common example is the Cantor distribution.
That's the most common 1-dimensional singular distribution, but higher
dimensional distributions are much more commonly singular. For
example, mixed continuous-discrete distributions, and other
distributions whose support is of lower dimension than the sample
space, e.g. X ~ N(0,1), Y=X.
Duncan Murdoch
On Tue, 03 Feb 2004 09:45:52 +0000, Matthias Kohl
<Matthias.Kohl@uni-bayreuth.de> wrote:
I think the most common example is the Cantor distribution.
That's the most common 1-dimensional singular distribution, but higher
dimensional distributions are much more commonly singular. For
example, mixed continuous-discrete distributions, and other
distributions whose support is of lower dimension than the sample
space, e.g. X ~ N(0,1), Y=X.
I don't think that qualifies as continuous, does it? Not in the sense
that the distribution function is continuous, surely.
The Cantor distribution is the one that has the "devils staircase" as
distribution function, right? Continuous, differentiable almost
everywhere but the derivative is always 0. (Take an interval, divide
in three, let F(0) = 0, F(1) = 1, F(x)=.5 on the middle third, and
define the outer thirds recursively.)
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
On Tue, 03 Feb 2004 09:45:52 +0000, Matthias Kohl
<Matthias.Kohl@uni-bayreuth.de> wrote:
I think the most common example is the Cantor distribution.
That's the most common 1-dimensional singular distribution, but higher
dimensional distributions are much more commonly singular. For
example, mixed continuous-discrete distributions, and other
distributions whose support is of lower dimension than the sample
space, e.g. X ~ N(0,1), Y=X.
The most common 1d singular distribution is probably a lifetime with an
atom at zero.
I think the question was about a continuous but not absolutely continuous
distribution, and indeed the Cantor distribution is the standard example
in theory courses.
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
On 03 Feb 2004 13:21:24 +0100, Peter Dalgaard
<p.dalgaard@biostat.ku.dk> wrote :
Duncan Murdoch <dmurdoch@pair.com> writes:
That's the most common 1-dimensional singular distribution, but higher
dimensional distributions are much more commonly singular. For
example, mixed continuous-discrete distributions, and other
distributions whose support is of lower dimension than the sample
space, e.g. X ~ N(0,1), Y=X.
I don't think that qualifies as continuous, does it? Not in the sense
that the distribution function is continuous, surely.
Yes, for my second example the 2-d distribution function is
continuous, because there are no atoms:
F(x,y) = P(X <= x, Y <= y) = Phi(min(x,y))
I was wrong about the mixed case; sorry.
Duncan
On Tue, 3 Feb 2004 12:31:10 +0000 (GMT Standard Time), Prof Brian D
Ripley <ripley@stats.ox.ac.uk> wrote :
On Tue, 3 Feb 2004, Duncan Murdoch wrote:
On Tue, 03 Feb 2004 09:45:52 +0000, Matthias Kohl
<Matthias.Kohl@uni-bayreuth.de> wrote:
I think the most common example is the Cantor distribution.
That's the most common 1-dimensional singular distribution, but higher
dimensional distributions are much more commonly singular. For
example, mixed continuous-discrete distributions, and other
distributions whose support is of lower dimension than the sample
space, e.g. X ~ N(0,1), Y=X.
The most common 1d singular distribution is probably a lifetime with an
atom at zero.
We differ in notation. I wouldn't call that one singular; I'd call it
mixed continuous and discrete, because the distribution function is a
sum of an absolutely continuous function and a step function. But in
the measure theory sense, it's singular w.r.t. Lebesgue measure.
Duncan
after some discussions with Martin Maechler and Josef Leydold
(WU Wien),
we have felt the need for some package that should allow for an
object-orientated
approach to distributions.
Great!
Thank you.
Our small group at Bayreuth now has developed a package "distr" which
tries to fill this gap, implementing distributions by means of
S4--classes.
You may find some value in looking at the Java distribution library that I
created a couple of years ago by porting the R distribution functions:
http://statdistlib.sourceforge.net/
A separate java implementation of the basic distribution classes is included
as part of the Hydra package for MCMC,
http://hydra-mcmc.sf.net
We have looked up these references with interest;
on the one hand, JAVA seems even more appropriate to our approach
with its concept of private/public interfaces, and on first glance, the
way we
attach functions as slots to objects seems to be more-JAVA/C++ -like
than conformal to the S4 concept,
but we wanted to stay *within* R to facilitate the use for the common
R-user
and to have available the full power of S in syntax and R as to computing.
We have inveseted some time into the decision how to implement the
methods that
we call "constitutive", i.e. r,d,p,q --- as slots as we did it or as
methods in the common
S4-concept. The main reason for our decision was:
Our classes are sort of "closed" under *(almost) arbitrary*
transformations ---
the result of a transformation being again *one* new distribtion that
only has be created
*once* for each transformation.
In particular, once the slots r,d,p,q is filled by an assignement of the
kind Z= X+Y [or any other transformation of distributions implemented],
in our solution, you can call d(Z)(x) which is the accessor function for
slot d
of Z returning a function which is then evaluated at x [or r,p,q]
arbitrarily often for different arguments x without redefining d.
Within the S4-concept, however, as far as we understand it, you would
probably
have to create a new class for each sort of transformation, e.g.
"ConvolutedDistribution", and corresponding methods r,d,p,q for this class
which would then be called by the method dispatcher.
But
+ either an object Z of class "ConvolutedDistribution" would not know
how to produce
r,d,p,q a priori, and for any call d(Z,x) convolution would have to be
redone, which would not
be effective,
+ or any object Z which results from the assignement Z=X+Y would have to
produce a new class [and corresponding derived methods....]!
This is what we meant when calling r,d,p,q "constitutive" for a
distribution in
our manual.
I would recommend giving more descriptive names to the methods. In
particular would recommend 'pdf', 'cdf', 'quantile', and 'random' instead of
simply 'p', 'd', 'q', 'r' so that the expressions are very clear.
we are open in this issue; if the audience prefers longer names, that is
oK for us....
we chose the short ones in order to allude to the common naming in R.
as subclasses of either of the two the subclasses
"AbscontDistribution" or
" DiscreteDistribution".
It is not at all clear to me what an 'AbscontDistribution' is. Perhaps you
are referring to a continuous distribution?
This issue has already been dealt with in another thread of postings....
You may also want to consider how to deal with multivariate distributions.
We might, but actually we doubt to be the right ones to do so...
[At least we would like to call in some *real* experts in this domain.]
Anyway, our class concept is open and we have already thought of such an
extension
This approach seems very appealing to us from a conceptual viewpoint:
Yes this is very interesting, particularly if you extend it to handle
multivariate distributions.
Would you like to give us advice in this direction?
You are definitely welcome to :-)
Thank you for your interest,
Peter, Matthias, Thomas, Florian
Yes, for my second example the 2-d distribution function is
continuous, because there are no atoms:
F(x,y) = P(X <= x, Y <= y) = Phi(min(x,y))
Right, sorry. I had it mixed up with a mental image of the conditional
distributions of Y given X or vice versa, which of course jump
discontinuously from 0 to 1.
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907