"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Fri, 29 Feb 2008 18:19:40 +0800 writes:
BAT> Dear all,
BAT> while looking for some inspiration of how to organise some code, I
BAT> studied the code of random.c and noticed that for distributions with
BAT> 2 or 3 parameters the user is not warned if NAs are created while such
BAT> a warning is issued for distributions with 1 parameter. E.g:
BAT> R version 2.7.0 Under development (unstable) (2008-02-29 r44639)
BAT> [...]
>> rexp(2, rate=Inf)
BAT> [1] NaN NaN
BAT> Warning message:
BAT> In rexp(2, rate = Inf) : NAs produced
>> rnorm(2, mean=Inf)
BAT> [1] NaN NaN
BAT> Surprisingly, the code for issuing warnings for distributions with 2 or
BAT> 3 parameters is essentially there, but does not seem to be used. The
BAT> attached patch rectifies this. With the patch the above command produce
BAT> the following output:
>> rexp(2, rate=Inf)
BAT> [1] NaN NaN
BAT> Warning message:
BAT> In rexp(2, rate = Inf) : NAs produced
>> rnorm(2, mean=Inf)
BAT> [1] NaN NaN
BAT> Warning message:
BAT> In rnorm(2, mean = Inf) : NAs produced
BAT> Please ignore the patch if the code that was designed to produce the
BAT> warning had been removed on purpose.
I cannot imagine a design reason for that. If there was, it
should have been mentioned as a comment in the C code.
I'll commit your patch (if it passes the checks).
BAT> BTW, there are other places in the code were NAs can be created but no
BAT> warning is issued. E.g:
>> rexp(2, rate=numeric())
BAT> [1] NA NA
>> rnorm(2, mean=numeric())
BAT> [1] NA NA
BAT> I wonder whether a warning should be issued in this case too.
Yes, "should in principle".
If you feel like finding another elegant patch...
Thank you Berwin,
for your contribution!
Martin
BAT> Best wishes,
BAT> Berwin
BAT> Index: src/main/random.c
BAT> ===================================================================
BAT> --- src/main/random.c (revision 44639)
BAT> +++ src/main/random.c (working copy)
BAT> @@ -123,7 +123,7 @@
BAT> #define RAND2(num,name) \
BAT> case num: \
BAT> - random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
BAT> + naflag = random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
BAT> break
BAT> /* "do_random2" - random sampling from 2 parameter families. */
BAT> @@ -207,7 +207,7 @@
BAT> #define RAND3(num,name) \
BAT> case num: \
BAT> - random3(name, REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \
BAT> + naflag = random3(name, REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \
BAT> break
BAT> ______________________________________________
BAT> R-devel at r-project.org mailing list
BAT> https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler <maechler at stat.math.ethz.ch> wrote:
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Fri, 29 Feb 2008 18:19:40 +0800 writes:
BAT> while looking for some inspiration of how to organise some
BAT> code, I studied the code of random.c and noticed that for
BAT> distributions with 2 or 3 parameters the user is not warned
BAT> if NAs are created while such a warning is issued for
BAT> distributions with 1 parameter. [...] The attached patch
BAT> rectifies this. [...]
I cannot imagine a design reason for that. If there was, it
should have been mentioned as a comment in the C code.
I'll commit your patch (if it passes the checks).
Sorry, I was a bit in a hurry when writing the e-mail, so I forgot to
mention that the source code modified by this patch compiles and passes
"make check FORCE=FORCE" on my machine.
And in my hurry, I also posted from my NUS account, without realising
it, which forced you to intervene as moderator and to approve the
posting. My apologies for the extra work. But this gave me the idea
to also subscribe to r-devel with my NUS account and configure the
subscriptions so that I only receive e-mail at my UWA account. Thus,
hopefully, you will not have to intervene again. (Which this e-mail
should test.)
BAT> BTW, there are other places in the code were NAs can be
BAT> created but no warning is issued. E.g:
>> rexp(2, rate=numeric())
BAT> [1] NA NA
>> rnorm(2, mean=numeric())
BAT> [1] NA NA
BAT> I wonder whether a warning should be issued in this case
BAT> too.
Yes, "should in principle".
If you feel like finding another elegant patch...
Well, elegance is in the eye of the beholder. :-) I attach two
patches. One that adds warning messages at the other places where NAs
can be generated.
The second one in additiona rearranges the code a bit such that in the
case when all the "vectors" that contain the parameter values of the
distribution, from which one wants to simulate, are of length one some
unnecessary calculations is taken out of the for loop. I am not sure
how much time is actually saved in this situation, but I belong to the
school that things such kind of optimisation should be done. :) If you
think it bloats the code too much (or duplicates things too much
leading to hard to maintain code), then feel free to ignore this second
patch.
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Wed, 5 Mar 2008 20:26:24 +0800 writes:
BAT> G'day Martin, On Mon, 3 Mar 2008 10:16:45 +0100 Martin
BAT> Maechler <maechler at stat.math.ethz.ch> wrote:
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg> >>>>>
>> on Fri, 29 Feb 2008 18:19:40 +0800 writes:
>>
BAT> while looking for some inspiration of how to organise
BAT> some code, I studied the code of random.c and noticed
BAT> that for distributions with 2 or 3 parameters the user
BAT> is not warned if NAs are created while such a warning
BAT> is issued for distributions with 1 parameter. [...] The
BAT> attached patch rectifies this. [...]
>>
>> I cannot imagine a design reason for that. If there was,
>> it should have been mentioned as a comment in the C code.
>>
>> I'll commit your patch (if it passes the checks).
BAT> Sorry, I was a bit in a hurry when writing the e-mail,
BAT> so I forgot to mention that the source code modified by
BAT> this patch compiles and passes "make check FORCE=FORCE"
BAT> on my machine.
ok.
BAT> And in my hurry, I also posted from my NUS account,
BAT> without realising it, which forced you to intervene as
BAT> moderator and to approve the posting. My apologies for
BAT> the extra work. But this gave me the idea to also
BAT> subscribe to r-devel with my NUS account and configure
BAT> the subscriptions so that I only receive e-mail at my
BAT> UWA account. Thus, hopefully, you will not have to
BAT> intervene again. (Which this e-mail should test.)
(and it succeeded)
BAT> BTW, there are other places in the code were NAs can be
BAT> created but no warning is issued. E.g:
>>
>> >> rexp(2, rate=numeric())
BAT> [1] NA NA
>> >> rnorm(2, mean=numeric())
BAT> [1] NA NA
>>
BAT> I wonder whether a warning should be issued in this
BAT> case too.
>>
>> Yes, "should in principle".
>>
>> If you feel like finding another elegant patch...
BAT> Well, elegance is in the eye of the beholder. :-)
BAT> I attach two patches. One that adds warning messages at
BAT> the other places where NAs can be generated.
ok. The result is very simple ``hence elegant''.
But actually, part of the changed behavior may be considered
undesirable:
rnorm(2, mean = NA)
which gives two NaN's would now produce a warning,
where I could argue that
'arithmetic with NAs should give NAs without a warning'
since
1:2 + NA
also gives NAs without a warning.
So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.
What do others (particularly R-core !) think?
BAT> The second one in additiona rearranges the code a bit
BAT> such that in the case when all the "vectors" that
BAT> contain the parameter values of the distribution, from
BAT> which one wants to simulate, are of length one some
BAT> unnecessary calculations is taken out of the for loop.
BAT> I am not sure how much time is actually saved in this
BAT> situation, but I belong to the school that things such
BAT> kind of optimisation should be done. :)
I understand, but I think most of R-core are "from the school"
that teaches
"if you want to optimize, first *measure*"
BAT> If you think it bloats the code too much (or duplicates
BAT> things too much leading to hard to maintain code), then
BAT> feel free to ignore this second patch.
For now, I will ignore this second patch.
- it does bloat the code slightly (as you conceded)
- it uses an if(<Simple>) test which makes the code slightly
*slower* for the (probably rare) case when the <Simple> is false.
but most importantly:
- we have no idea if the speedup (when <Simple> is TRUE)
is in the order of 10%, 1% or 0.1%
My guess would be '0.1%' for rnorm(), and that would
definitely not warrant the extra check.
>> Thank you Berwin, for your contribution!
and thanks again!
Martin
BAT> My pleasure.
BAT> Cheers,
BAT> Berwin
G'day Martin (and "listeners"),
On Fri, 7 Mar 2008 15:01:26 +0100
Martin Maechler <maechler at stat.math.ethz.ch> wrote:
[...]
>> If you feel like finding another elegant patch...
BAT> Well, elegance is in the eye of the beholder. :-)
BAT> I attach two patches. One that adds warning messages at
BAT> the other places where NAs can be generated.
ok. The result is very simple ``hence elegant''.
But actually, part of the changed behavior may be considered
undesirable:
rnorm(2, mean = NA)
which gives two NaN's would now produce a warning,
where I could argue that
'arithmetic with NAs should give NAs without a warning'
since
1:2 + NA
also gives NAs without a warning.
So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.
What do others (particularly R-core !) think?
I can agree with that point of view. But then, should a warning not
be issued only if one of the parameters of the distribution is NA, or
should it also not be issued if any non-finite parameter is
encountered? After all,
1:2 + Inf
[1] Inf Inf
does not create any warning either. In that case, a patch as the
attached should do, it checks all parameters for finiteness and then
checks whether the generated number is not finite.
Thus, with the patch applied I see the following behaviour:
rnorm(2, mean=NA)
[1] NaN NaN
rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
[1] 1.897874 NaN NaN NaN NaN
rbinom(2, size=20, p=1.2)
[1] NaN NaN
Warning message:
In rbinom(2, size = 20, p = 1.2) : NAs produced
rexp(2, rate=-2)
[1] NaN NaN
Warning message:
In rexp(2, rate = -2) : NAs produced
Without the patch:
rnorm(2, mean=NA)
[1] NaN NaN
rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
[1] -0.1719657 NaN NaN NaN NaN
rbinom(2, size=20, p=1.2)
[1] NaN NaN
rexp(2, rate=-2)
[1] NaN NaN
Warning message:
In rexp(2, rate = -2) : NAs produced
On my machine, "make check FORCE=FORCE" passes with this patch.
[...]
For now, I will ignore this second patch.
- it does bloat the code slightly (as you conceded)
Fair enough. :) I also proved my point that more complicated code is
harder to maintain. In the two parameter case I was testing twice na
for being one instead of testing na and nb.......
[...]
but most importantly:
- we have no idea if the speedup (when <Simple> is TRUE)
is in the order of 10%, 1% or 0.1%
My guess would be '0.1%' for rnorm(), and that would
definitely not warrant the extra check.
I would not bet against this. Probably even with all the additional
checks for finiteness of parameters there would be not much speed
difference. The question might be whether you want to replace the
(new) R_FINITE()'s rather by ISNA()'s (or something else). One could
also discuss in which order the checks should be made (first generated
number then parameters of distribution or vice versa). But I will
leave this to R-core to decide. :)
[ ... ]
But actually, part of the changed behavior may be considered
undesirable:
rnorm(2, mean = NA)
which gives two NaN's would now produce a warning,
where I could argue that
'arithmetic with NAs should give NAs without a warning'
since
1:2 + NA
also gives NAs without a warning.
So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.
What do others (particularly R-core !) think?
I think the answer depends on the probability that
the user realizes that the parameter is NA. Obviously
the user should know if 'rnorm' is used directly. If it
is used inside a function, then the user probably doesn't
know. Not warning in such a case means that tracking
down the ultimate cause of the problem could be harder.
However, it seems to me that the function calling 'rnorm'
is really the one responsible for warning the user.
Pat
[ ... ]
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Fri, 7 Mar 2008 23:54:06 +0800 writes:
BAT> G'day Martin (and "listeners"), On Fri, 7 Mar 2008
BAT> 15:01:26 +0100 Martin Maechler
BAT> <maechler at stat.math.ethz.ch> wrote:
BAT> [...]
>> >> If you feel like finding another elegant patch...
>>
BAT> Well, elegance is in the eye of the beholder. :-)
>>
BAT> I attach two patches. One that adds warning messages
BAT> at the other places where NAs can be generated.
>>
>> ok. The result is very simple ``hence elegant''.
>>
>> But actually, part of the changed behavior may be
>> considered undesirable:
>>
>> rnorm(2, mean = NA)
>>
>> which gives two NaN's would now produce a warning, where
>> I could argue that 'arithmetic with NAs should give NAs
>> without a warning' since 1:2 + NA also gives NAs without
>> a warning.
>>
>> So we could argue that a warning should *only* be
>> produced in a case where the parameters of the
>> distribution are not NA.
>>
>> What do others (particularly R-core !) think?
BAT> I can agree with that point of view. But then, should
BAT> a warning not be issued only if one of the parameters
BAT> of the distribution is NA, or should it also not be
BAT> issued if any non-finite parameter is encountered?
BAT> After all,
>> 1:2 + Inf
BAT> [1] Inf Inf
BAT> does not create any warning either.
but it doesn't give NA either.
A bit more relevant (and I'm sure you meant implicitly)
> Inf/Inf
[1] NaN
However, I think we shouldn't go as far.
BAT> In that case, a patch as the attached should do, it
BAT> checks all parameters for finiteness and then checks
BAT> whether the generated number is not finite.
BAT> Thus, with the patch applied I see the following
BAT> behaviour:
>> rnorm(2, mean=NA)
BAT> [1] NaN NaN
>> rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
BAT> [1] 1.897874 NaN NaN NaN NaN
>> rbinom(2, size=20, p=1.2)
BAT> [1] NaN NaN Warning message: In rbinom(2, size = 20, p
BAT> = 1.2) : NAs produced
>> rexp(2, rate=-2)
BAT> [1] NaN NaN Warning message: In rexp(2, rate = -2) :
BAT> NAs produced
I've committed a bit a different patch for now (svn rev 44715),
because I felt we were getting too sophisticated.
The current (since about 20 minutes) situation is to
warn when the *result* is NA or NaN and not to warn in any other
cases.
This maybe not be perfect, but at least is very consistent and
in particular consistent to what the warning itself "says".
BTW, I've also found that I
rnorm(n, mu=Inf) |--> NA was not ok, and changed that to
rnorm(n, mu=Inf) |--> Inf
More feedback is welcome,
but please now "start" with R-devel rev >= 44715
Martin
BAT> Without the patch:
>> rnorm(2, mean=NA)
BAT> [1] NaN NaN
>> rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
BAT> [1] -0.1719657 NaN NaN NaN NaN
>> rbinom(2, size=20, p=1.2)
BAT> [1] NaN NaN
>> rexp(2, rate=-2)
BAT> [1] NaN NaN Warning message: In rexp(2, rate = -2) :
BAT> NAs produced
BAT> On my machine, "make check FORCE=FORCE" passes with
BAT> this patch.
BAT> [...]
>> For now, I will ignore this second patch.
>>
>> - it does bloat the code slightly (as you conceded)
BAT> Fair enough. :) I also proved my point that more
BAT> complicated code is harder to maintain. In the two
BAT> parameter case I was testing twice na for being one
BAT> instead of testing na and nb.......
BAT> [...]
>> but most importantly:
>>
>> - we have no idea if the speedup (when <Simple> is TRUE)
>> is in the order of 10%, 1% or 0.1%
>>
>> My guess would be '0.1%' for rnorm(), and that would
>> definitely not warrant the extra check.
BAT> I would not bet against this. Probably even with all
BAT> the additional checks for finiteness of parameters
BAT> there would be not much speed difference. The question
BAT> might be whether you want to replace the (new)
BAT> R_FINITE()'s rather by ISNA()'s (or something else).
BAT> One could also discuss in which order the checks should
BAT> be made (first generated number then parameters of
BAT> distribution or vice versa). But I will leave this to
BAT> R-core to decide. :)
>> >> Thank you Berwin, for your contribution!
>>
>> and thanks again!
BAT> Still my pleasure. :)
BAT> Cheers,
BAT> Berwin Index: src/main/random.c
BAT> ===================================================================
BAT> --- src/main/random.c (revision 44693) +++
BAT> src/main/random.c (working copy) @@ -44,7 +44,7 @@ for
BAT> (i = 0; i < n; i++) { ai = a[i % na]; x[i] = f(ai); -
BAT> if (!R_FINITE(x[i])) naflag = 1; + if (!R_FINITE(x[i])
BAT> && R_FINITE(ai)) naflag = 1; } return(naflag); } @@
BAT> -81,6 +81,7 @@ if (na < 1) { for (i = 0; i < n; i++)
BAT> REAL(x)[i] = NA_REAL; + warning(_("NAs produced")); }
BAT> else { PROTECT(a = coerceVector(CADR(args), REALSXP));
BAT> @@ -116,14 +117,14 @@ ai = a[i % na]; bi = b[i % nb];
BAT> x[i] = f(ai, bi); - if (!R_FINITE(x[i])) naflag = 1; +
BAT> if (!R_FINITE(x[i]) && R_FINITE(ai) && R_FINITE(bi))
BAT> naflag = 1; } return(naflag); }
BAT> #define RAND2(num,name) \ case num: \ - random2(name,
BAT> REAL(a), na, REAL(b), nb, REAL(x), n); \ + naflag =
BAT> random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
BAT> break
BAT> /* "do_random2" - random sampling from 2 parameter
BAT> families. */ @@ -155,6 +156,7 @@ if (na < 1 || nb < 1)
BAT> { for (i = 0; i < n; i++) REAL(x)[i] = NA_REAL; +
BAT> warning(_("NAs produced")); } else { PROTECT(a =
BAT> coerceVector(CADR(args), REALSXP)); @@ -200,14 +202,14
BAT> @@ bi = b[i % nb]; ci = c[i % nc]; x[i] = f(ai, bi,
BAT> ci); - if (!R_FINITE(x[i])) naflag = TRUE; + if
BAT> (!R_FINITE(x[i]) && R_FINITE(ai) && R_FINITE(bi) &&
BAT> R_FINITE(ci)) naflag = TRUE; } return(naflag); }
BAT> #define RAND3(num,name) \ case num: \ - random3(name,
BAT> REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \ +
BAT> naflag = random3(name, REAL(a), na, REAL(b), nb,
BAT> REAL(c), nc, REAL(x), n); \ break
BAT> @@ -244,6 +246,7 @@ if (na < 1 || nb < 1 || nc < 1) {
BAT> for (i = 0; i < n; i++) REAL(x)[i] = NA_REAL; +
BAT> warning(_("NAs produced")); } else { PROTECT(a =
BAT> coerceVector(a, REALSXP));
BAT> ______________________________________________
BAT> R-devel at r-project.org mailing list
BAT> https://stat.ethz.ch/mailman/listinfo/r-devel
G'day Martin and others,
On Mon, 10 Mar 2008 12:06:01 +0100
Martin Maechler <maechler at stat.math.ethz.ch> wrote:
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Fri, 7 Mar 2008 23:54:06 +0800 writes:
BAT> After all,
>> 1:2 + Inf
BAT> [1] Inf Inf
BAT> does not create any warning either.
but it doesn't give NA either.
A bit more relevant (and I'm sure you meant implicitly)
You give me more credit than I deserve. :)
I was guided by what rexp() was doing when I chose that example, i.e.
(potentially) warning about NAs being created when actually +/- Infs
were created. In this thread I was already once or twice tempted to
comment on the appropriateness of the warning message but for various
reasonx always refrained from doing so.
BTW, I've also found that I
rnorm(n, mu=Inf) |--> NA was not ok, and changed that to
rnorm(n, mu=Inf) |--> Inf
More feedback is welcome,
but please now "start" with R-devel rev >= 44715
While I agree with this change, I think it opens a whole can of worms,
since it invites a rethink about all distributions. At the moment
we have:
<snip>
R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
<snip>
set.seed(1) ; exp(rnorm(2))
[1] 0.5344838 1.2015872
set.seed(1) ; rlnorm(2)
[1] 0.5344838 1.2015872
set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
[1] 0 Inf
set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
[1] NaN NaN
Warning message:
In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced
The first two lines give identical results, as one could reasonably
expect. But the other two do not and I would argue that a user could
reasonably expect that the commands in these two lines should lead to
identical results.
Likewise, coming back to the one-parameter distribution used in this
thread for illustration purposes, the *exp() functions in R are
parameterised by the rate and an exponential random variable with rate
r has mean (1/r) and variance (1/r^2). Thus, I would argue that
rexp(2, Inf) should return 0's and not NaN's, since in the limit
a rate of Inf seems to refer to a variable with mean 0 and variance 0,
i.e. the constant 0. And it would also be "more consistent" with the
behaviour of rexp() when the rate parameter is "large but finite".
Then one can argue about rgeom() when p=0. But I guess in that case
the limit is a "random" variable with "infinite mean" and "infinite
variance" and hence it is o.k. to return NaNs and not Infs. After all,
your comments in rnorm.c seem to indicate that you think that reporting
+/- Inf back is only o.k. if the mean is +/- Inf but the variance is
finite.
I did not look at (or think about) extreme cases for any other
distributions, but further discussion/feedback would indeed be helpful
it seems. :)
And the attached patch would address the behaviour of rlnorm() and
rexp() that I have raised above. With these changes, on my machine,
"make check FORCE=FORCE" succeeds.
BTW, I was surprised to realise that the *exp() functions in the
underlying C code use the opposite parameterisation from the
corresponding functions at R level. Perhaps it would be worthwhile to
point this out in section 6.7.1 of the Writing R extension manual? In
particular since the manual states:
Note that these argument sequences are (apart from the names and that
@code{rnorm} has no @var{n}) exactly the same as the corresponding
@R{} functions of the same name, so the documentation of the @R{}
functions can be used.
Well, as I noticed the hard way, for *exp() the documentation of the
corresponding R functions cannot be used. ;-)
Thanks for you patience.
Cheers,
Berwin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R-patch
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20080311/3aa1f5d2/attachment.pl
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Tue, 11 Mar 2008 13:19:46 +0800 writes:
BAT> G'day Martin and others,
BAT> On Mon, 10 Mar 2008 12:06:01 +0100
BAT> Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
>> >>>>> on Fri, 7 Mar 2008 23:54:06 +0800 writes:
>>
BAT> After all,
>>
>> >> 1:2 + Inf
BAT> [1] Inf Inf
>>
BAT> does not create any warning either.
>>
>> but it doesn't give NA either.
>> A bit more relevant (and I'm sure you meant implicitly)
BAT> You give me more credit than I deserve. :)
BAT> I was guided by what rexp() was doing when I chose that example, i.e.
BAT> (potentially) warning about NAs being created when actually +/- Infs
BAT> were created. In this thread I was already once or twice tempted to
BAT> comment on the appropriateness of the warning message but for various
BAT> reasonx always refrained from doing so.
>> BTW, I've also found that I
>> rnorm(n, mu=Inf) |--> NA was not ok, and changed that to
>> rnorm(n, mu=Inf) |--> Inf
>>
>> More feedback is welcome,
>> but please now "start" with R-devel rev >= 44715
BAT> While I agree with this change, I think it opens a whole can of worms,
not a big can, though ....
BAT> since it invites a rethink about all distributions. At the moment
BAT> we have:
BAT> <snip>
BAT> R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
BAT> <snip>
>> set.seed(1) ; exp(rnorm(2))
BAT> [1] 0.5344838 1.2015872
>> set.seed(1) ; rlnorm(2)
BAT> [1] 0.5344838 1.2015872
>> set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
BAT> [1] 0 Inf
>> set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
BAT> [1] NaN NaN
BAT> Warning message:
BAT> In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced
BAT> The first two lines give identical results, as one could reasonably
BAT> expect.
Yes, but I don't think a user should *rely* on the way R
generates these random number; though I agree for the very
specific case of rlnorm(..).
BAT> But the other two do not and I would argue that a user could
BAT> reasonably expect that the commands in these two lines should lead to
BAT> identical results.
They now do.
BAT> Likewise, coming back to the one-parameter distribution used in this
BAT> thread for illustration purposes, the *exp() functions in R are
BAT> parameterised by the rate and an exponential random variable with rate
BAT> r has mean (1/r) and variance (1/r^2). Thus, I would argue that
BAT> rexp(2, Inf) should return 0's and not NaN's, since in the limit
BAT> a rate of Inf seems to refer to a variable with mean 0 and variance 0,
BAT> i.e. the constant 0. And it would also be "more consistent" with the
BAT> behaviour of rexp() when the rate parameter is "large but finite".
yes. And that's *the* (only) case where 'Inf' parameters or 'scale = 0'
situations should work: when it is a natural limit behavior.
BAT> Then one can argue about rgeom() when p=0. But I guess in that case
BAT> the limit is a "random" variable with "infinite mean" and "infinite
BAT> variance" and hence it is o.k. to return NaNs and not Infs. After all,
BAT> your comments in rnorm.c seem to indicate that you think that reporting
BAT> +/- Inf back is only o.k. if the mean is +/- Inf but the variance is
BAT> finite.
BAT> I did not look at (or think about) extreme cases for any other
BAT> distributions, but further discussion/feedback would indeed be helpful
BAT> it seems. :)
I looked at a few more, basically all continuous
src/nmath/r<foo>.c ones.
BAT> And the attached patch would address the behaviour of rlnorm() and
BAT> rexp() that I have raised above. With these changes, on my machine,
BAT> "make check FORCE=FORCE" succeeds.
I don't think your change to .../R/distn.R was good,
but the others I have more or less committed together with a few
more similar ones.
BAT> BTW, I was surprised to realise that the *exp() functions in the
BAT> underlying C code use the opposite parameterisation from the
BAT> corresponding functions at R level. Perhaps it would be worthwhile to
BAT> point this out in section 6.7.1 of the Writing R extension manual? In
BAT> particular since the manual states:
BAT> Note that these argument sequences are (apart from the names and that
BAT> @code{rnorm} has no @var{n}) exactly the same as the corresponding
BAT> @R{} functions of the same name, so the documentation of the @R{}
BAT> functions can be used.
BAT> Well, as I noticed the hard way, for *exp() the documentation of the
BAT> corresponding R functions cannot be used. ;-)
We often also gratefully accept patches for the documentation
...
BAT> Thanks for you patience.
;-)
Regards,
Martin
Martin Maechler <maechler at stat.math.ethz.ch> wrote:
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Tue, 11 Mar 2008 13:19:46 +0800 writes:
[...]
BAT> The first two lines give identical results, as one could
BAT> reasonably expect.
Yes, but I don't think a user should *rely* on the way R
generates these random number; though I agree for the very
specific case of rlnorm(..).
BAT> But the other two do not and I would argue that a user could
BAT> reasonably expect that the commands in these two lines
BAT> should lead to identical results.
They now do.
Well, actually I forgot to mention one could also put another argument
forward. As the log-mean parameter of the lognormal distribution
goes to -Inf, the distribution degenerates to something that has mean 0
and variance 0, i.e. could be taken as the constant zero and, hence,
one might expect that rlnorm(1, -Inf) returns 0.
But as the log-mean parameter goes to Inf, the distribution degenerates
to something with infinite mean and infinite variance. Thus, perhaps
it is more sensible for rlnorm(1, Inf) to return NaN instead of Inf.....
I don't think your change to .../R/distn.R was good,
I didn't like it either, but it was the simplest way I could think of
that would allow the C rexp() routine to realise that a scale parameter
of 0 actually came from a rate parameter of -Inf in the R code.
but the others I have more or less committed together with a few
more similar ones.
Thanks.
BAT> BTW, I was surprised to realise that the *exp() functions in
BAT> the underlying C code use the opposite parameterisation from
BAT> the corresponding functions at R level. Perhaps it would be
BAT> worthwhile to point this out in section 6.7.1 of the Writing
BAT> R extension manual? In particular since the manual states:
BAT> Note that these argument sequences are (apart from the names
BAT> and that @code{rnorm} has no @var{n}) exactly the same as
BAT> the corresponding @R{} functions of the same name, so the
BAT> documentation of the @R{} functions can be used.
BAT> Well, as I noticed the hard way, for *exp() the
BAT> documentation of the corresponding R functions cannot be
BAT> used. ;-)
We often also gratefully accept patches for the documentation
I know, and I am always amazed that despite this policy (or perhaps
because of it?) the documentation of R is not patchy.... ;-)
Cheers,
Berwin
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
on Wed, 12 Mar 2008 01:23:10 +0800 writes:
BAT> G'day Martin, On Tue, 11 Mar 2008 18:07:35 +0100 Martin
BAT> Maechler <maechler at stat.math.ethz.ch> wrote:
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg> >>>>>
>> on Tue, 11 Mar 2008 13:19:46 +0800 writes:
BAT> [...] The first two lines give identical results, as
BAT> one could reasonably expect.
>>
>> Yes, but I don't think a user should *rely* on the way R
>> generates these random number; though I agree for the
>> very specific case of rlnorm(..).
>>
BAT> But the other two do not and I would argue that a user
BAT> could reasonably expect that the commands in these two
BAT> lines should lead to identical results.
>>
>> They now do.
BAT> Well, actually I forgot to mention one could also put
BAT> another argument forward. As the log-mean parameter of
BAT> the lognormal distribution goes to -Inf, the
BAT> distribution degenerates to something that has mean 0
BAT> and variance 0, i.e. could be taken as the constant
BAT> zero and, hence, one might expect that rlnorm(1, -Inf)
BAT> returns 0.
yes; and that has been a consequence of the changes I did yesterday.
BAT> But as the log-mean parameter goes to Inf, the
BAT> distribution degenerates to something with infinite
BAT> mean and infinite variance. Thus, perhaps it is more
BAT> sensible for rlnorm(1, Inf) to return NaN instead of
BAT> Inf.....
well. We have pretty strictly tried to follow the principle
that 'Inf' or '-Inf' should be used when there are clear limit
If lim_{x \to x_0} f(x) = +/- Inf =: M
we would set
f(x_0) = M
for x_0 in |R + {-Inf, Inf}
If we apply this principle here, clearly,
rlnorm(., Inf) := lim_{lmu -> Inf} rlnorm(., lmu) =
= lim_{lmu -> Inf} exp(rnorm(., lmu)) =
= exp(lim_{lmu -> Inf} rnorm(., lmu)) =
= exp(Inf) = Inf
{ or you directly numerically see that already
rlnorm(50, 1000) gives all Inf }
Martin
>> I don't think your change to .../R/distn.R was good,
BAT> I didn't like it either, but it was the simplest way I
BAT> could think of that would allow the C rexp() routine to
BAT> realise that a scale parameter of 0 actually came from
BAT> a rate parameter of -Inf in the R code.
>> but the others I have more or less committed together
>> with a few more similar ones.
BAT> Thanks.
BAT> BTW, I was surprised to realise that the *exp()
BAT> functions in the underlying C code use the opposite
BAT> parameterisation from the corresponding functions at R
BAT> level. Perhaps it would be worthwhile to point this
BAT> out in section 6.7.1 of the Writing R extension manual?
BAT> In particular since the manual states:
>>
BAT> Note that these argument sequences are (apart from the
BAT> names and that @code{rnorm} has no @var{n}) exactly the
BAT> same as the corresponding @R{} functions of the same
BAT> name, so the documentation of the @R{} functions can be
BAT> used.
>>
BAT> Well, as I noticed the hard way, for *exp() the
BAT> documentation of the corresponding R functions cannot
BAT> be used. ;-)
>>
>> We often also gratefully accept patches for the
>> documentation
BAT> I know, and I am always amazed that despite this policy
BAT> (or perhaps because of it?) the documentation of R is
BAT> not patchy.... ;-)
BAT> Cheers,
BAT> Berwin