Hello all
A pragmatic argument for allowing size=3D=3D0 is the situation where the =
size is in itself a random variable (that's how I stumbled over the =
inconsistency, by the way).
For example, in textbooks on probability it is stated that:
If X is Poisson(lambda), and the conditional=20
distribution of Y given X is Binomial(X,p), then=20
Y is Poisson(lambda*p).
(cf eg Pitman's "Probability", p. 400)
Clearly this statement requires Binomial(0,p) to be a well-defined =
distribution.
Such statements would be quite convoluted if we did not define =
Binomial(0,p) as a legal (but degenerate) distribution. The same applies =
to codes where the size parameter may attain the value 0.
Just my 2 cents.
Cheers,
Uffe
-----Oprindelig meddelelse-----
Fra: pd at pubhealth.ku.dk p=E5 vegne af Peter Dalgaard
Sendt: s=F8 05-02-2006 01:33
Til: P Ehlers
Cc: ted.harding at nessie.mcc.ac.uk; Peter Dalgaard; R-bugs at biostat.ku.dk; =
r-devel at stat.math.ethz.ch; Uffe H=F8gsbro Thygesen
Emne: Re: [Rd] pbinom with size argument 0 (PR#8560)
=20
P Ehlers <ehlers at math.ucalgary.ca> writes:
I prefer a (consistent) NaN. What happens to our notion of a
Binomial RV as a sequence of Bernoulli RVs if we permit n=3D0?
I have never seen (nor contemplated, I confess) the definition
of a Bernoulli RV as anything other than some dichotomous-outcome
one-trial random experiment.=20
What's the problem ??
An n=3D0 binomial is the sum of an empty set of Bernoulli RV's, and the
sum over an empty set is identically 0.
Not n trials, where n might equal zero,
but _one_ trial. I can't see what would be gained by permitting a
zero-trial experiment. If we assign probability 1 to each outcome,
we have a problem with the sum of the probabilities.
Consistency is what you gain. E.g.=20
binom(.,n=3Dn1+n2,p) =3D=3D binom(.,n=3Dn1,p) * binom(.,n=3Dn2,p)
where * denotes convolution. This will also hold for n1=3D0 or n2=3D0 if
the binomial in that case is defined as a one-point distribution at
zero. Same thing as any(logical(0)) etc., really.
--=20
O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) =
35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) =
35327907
Hello all
A pragmatic argument for allowing size=3D=3D0 is the situation where
the size is in itself a random variable (that's how I stumbled over
the inconsistency, by the way).
For example, in textbooks on probability it is stated that:
If X is Poisson(lambda), and the conditional=20
distribution of Y given X is Binomial(X,p), then=20
Y is Poisson(lambda*p).
(cf eg Pitman's "Probability", p. 400)
Clearly this statement requires Binomial(0,p) to be a well-defined
distribution.
Such statements would be quite convoluted if we did not define
Binomial(0,p) as a legal (but degenerate) distribution. The same
applies to codes where the size parameter may attain the value 0.
Just my 2 cents.
Cheers,
Uffe
Uffe's pragmatic argument is of course convincing at least in
the circumstances he refers to. However, Peter Ehlers' posting
has re-stimulated the underlying ambiguity I feel about this
issue (intially, that the probability of a "non-event" should
be undefined).
Thus I can envisage different circumatances in which one or the
other view could be appropriate.
Uffe observes a Poisson-distributed number of Bernoulli trials
and records the number of "successes", with zero if the Poisson
distribution says "zero trials". In that case no Bernoulli trial
has been carried out, so the issue of what the distribution over
its empty set of outcomes should be is irrelevant. However, he
can encapsulate this process mathematically by assigning P=1
to the outcome r=0 when n=0, and this may well lead to a more
straightforward R program, for instance (which, reading between
the lines, may well be what really happened in his case).
On the other hand, suppose I (and maybe Peter Ehlers too) am
simulating a study in which random numbers (according to some
distribution) of subjects become available, in each "sweep" of the
study, for questionnaire, and the outcome of interest is the
number in the "sweep" answering "Yes" to a question. Part of this
simulation is to create a database of responses along with concomitant
variables. It is possible (and under some circumstances perhaps more
likely) that the number of available subjects in a "sweep" is zero --
these people cannot be contacted, say.
Maybe I'm studying a "missing data" situation.
In that case it would be natural to enter "r=NA" in the
database for those sweeps which produces no responses. This
would denote "missing data". And natural also to (initially,
before embarking on say an imputation exercise) to attribute
"P=NA" to the probability of "Yes" for such a group since
we do not have any direct information (though may be able to
exploit associations between other variables to obtain indirect
information, under certain assumptions).
So maybe one could need implementations of pbinom and dbinom
which work differently in different circumstances. But what
remains important is that, whichever way they work in given
circumstances, they should be consistent with each other.
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Feb-06 Time: 10:10:19
------------------------------ XFMail ------------------------------