patch for random.c

"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Fri, 29 Feb 2008 18:19:40 +0800 writes:
BAT> Dear all,
    BAT> while looking for some inspiration of how to organise some code, I
    BAT> studied the code of random.c and noticed that for distributions with
    BAT> 2 or 3 parameters the user is not warned if NAs are created while such
    BAT> a warning is issued for distributions with 1 parameter.  E.g:

    BAT> R version 2.7.0 Under development (unstable) (2008-02-29 r44639)

    BAT> [...]

    >> rexp(2, rate=Inf)
    BAT> [1] NaN NaN
    BAT> Warning message:
    BAT> In rexp(2, rate = Inf) : NAs produced
    >> rnorm(2, mean=Inf)
    BAT> [1] NaN NaN

    BAT> Surprisingly, the code for issuing warnings for distributions with 2 or
    BAT> 3 parameters is essentially there, but does not seem to be used.  The
    BAT> attached patch rectifies this.  With the patch the above command produce
    BAT> the following output:

    >> rexp(2, rate=Inf)
    BAT> [1] NaN NaN
    BAT> Warning message:
    BAT> In rexp(2, rate = Inf) : NAs produced
    >> rnorm(2, mean=Inf)
    BAT> [1] NaN NaN
    BAT> Warning message:
    BAT> In rnorm(2, mean = Inf) : NAs produced

    BAT> Please ignore the patch if the code that was designed to produce the
    BAT> warning had been removed on purpose.  

I cannot imagine a design reason for that.  If there was, it
should have been mentioned as a comment in the C code.

I'll commit your patch (if it passes the checks).

    BAT> BTW, there are other places in the code were NAs can be created but no
    BAT> warning is issued.  E.g:

    >> rexp(2, rate=numeric())
    BAT> [1] NA NA
    >> rnorm(2, mean=numeric())
    BAT> [1] NA NA

    BAT> I wonder whether a warning should be issued in this case too.  

Yes, "should in principle".

If you feel like finding another elegant patch...

Thank you Berwin,
for your contribution!

Martin

    BAT> Best wishes,

    BAT> Berwin

    BAT> Index: src/main/random.c
    BAT> ===================================================================
    BAT> --- src/main/random.c	(revision 44639)
    BAT> +++ src/main/random.c	(working copy)
    BAT> @@ -123,7 +123,7 @@

    BAT> #define RAND2(num,name) \
    BAT> case num: \
    BAT> -		random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
    BAT> +		naflag = random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
    BAT> break

    BAT> /* "do_random2" - random sampling from 2 parameter families. */
    BAT> @@ -207,7 +207,7 @@

    BAT> #define RAND3(num,name) \
    BAT> case num: \
    BAT> -		random3(name, REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \
    BAT> +		naflag = random3(name, REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \
    BAT> break

    BAT> ______________________________________________
    BAT> R-devel at r-project.org mailing list
    BAT> https://stat.ethz.ch/mailman/listinfo/r-devel
G'day Martin,

On Mon, 3 Mar 2008 10:16:45 +0100

"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Fri, 29 Feb 2008 18:19:40 +0800 writes:
    BAT> while looking for some inspiration of how to organise some
    BAT> code, I studied the code of random.c and noticed that for
    BAT> distributions with 2 or 3 parameters the user is not warned
    BAT> if NAs are created while such a warning is issued for
    BAT> distributions with 1 parameter. [...] The attached patch
    BAT> rectifies this.  [...]

I cannot imagine a design reason for that.  If there was, it
should have been mentioned as a comment in the C code.

I'll commit your patch (if it passes the checks).
Sorry, I was a bit in a hurry when writing the e-mail, so I forgot to
mention that the source code modified by this patch compiles and passes
"make check FORCE=FORCE" on my machine.

And in my hurry, I also posted from my NUS account, without realising
it, which forced you to intervene as moderator and to approve the
posting.  My apologies for the extra work.  But this gave me the idea
to also subscribe to r-devel with my NUS account and configure the
subscriptions so that I only receive e-mail at my UWA account.  Thus,
hopefully, you will not have to intervene again.  (Which this e-mail
should test.)
    BAT> BTW, there are other places in the code were NAs can be
    BAT> created but no warning is issued.  E.g:

    >> rexp(2, rate=numeric())
    BAT> [1] NA NA
    >> rnorm(2, mean=numeric())
    BAT> [1] NA NA

    BAT> I wonder whether a warning should be issued in this case
    BAT> too.  

Yes, "should in principle".

If you feel like finding another elegant patch...
Well, elegance is in the eye of the beholder. :-)  I attach two
patches.  One that adds warning messages at the other places where NAs
can be generated.

The second one in additiona rearranges the code a bit such that in the
case when all the "vectors" that contain the parameter values of the
distribution, from which one wants to simulate, are of length one some
unnecessary calculations is taken out of the for loop.  I am not sure
how much time is actually saved in this situation, but I belong to the
school that things such kind of optimisation should be done. :)  If you
think it bloats the code too much (or duplicates things too much
leading to hard to maintain code), then feel free to ignore this second
patch.
Thank you Berwin, for your contribution!
My pleasure.

Cheers,

	Berwin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R-patch2
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20080305/d21ea9a8/attachment.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R-patch3
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20080305/d21ea9a8/attachment-0001.pl
Hi Berwin (and "listeners"),
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Wed, 5 Mar 2008 20:26:24 +0800 writes:
BAT> G'day Martin, On Mon, 3 Mar 2008 10:16:45 +0100 Martin
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg> >>>>>
    >> on Fri, 29 Feb 2008 18:19:40 +0800 writes:
    >> 
    BAT> while looking for some inspiration of how to organise
    BAT> some code, I studied the code of random.c and noticed
    BAT> that for distributions with 2 or 3 parameters the user
    BAT> is not warned if NAs are created while such a warning
    BAT> is issued for distributions with 1 parameter. [...] The
    BAT> attached patch rectifies this.  [...]
    >> 
    >> I cannot imagine a design reason for that.  If there was,
    >> it should have been mentioned as a comment in the C code.
    >> 
    >> I'll commit your patch (if it passes the checks).

    BAT> Sorry, I was a bit in a hurry when writing the e-mail,
    BAT> so I forgot to mention that the source code modified by
    BAT> this patch compiles and passes "make check FORCE=FORCE"
    BAT> on my machine.

ok.

    BAT> And in my hurry, I also posted from my NUS account,
    BAT> without realising it, which forced you to intervene as
    BAT> moderator and to approve the posting.  My apologies for
    BAT> the extra work.  But this gave me the idea to also
    BAT> subscribe to r-devel with my NUS account and configure
    BAT> the subscriptions so that I only receive e-mail at my
    BAT> UWA account.  Thus, hopefully, you will not have to
    BAT> intervene again.  (Which this e-mail should test.)

(and it succeeded)

    BAT> BTW, there are other places in the code were NAs can be
    BAT> created but no warning is issued.  E.g:
    >> 
    >> >> rexp(2, rate=numeric())
    BAT> [1] NA NA
    >> >> rnorm(2, mean=numeric())
    BAT> [1] NA NA
    >> 
    BAT> I wonder whether a warning should be issued in this
    BAT> case too.
    >> 
    >> Yes, "should in principle".
    >> 
    >> If you feel like finding another elegant patch...

    BAT> Well, elegance is in the eye of the beholder. :-) 

    BAT> I attach two patches.  One that adds warning messages at
    BAT> the other places where NAs can be generated.

ok. The result is very simple ``hence elegant''.

But actually, part of the changed behavior may be considered
undesirable:

  rnorm(2, mean = NA)

which gives two NaN's  would now produce a warning,
where I could argue that 
   'arithmetic with NAs should give NAs without a warning'
since
  1:2 + NA
also gives NAs without a warning.

So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.

What do others (particularly R-core !) think?

    BAT> The second one in additiona rearranges the code a bit
    BAT> such that in the case when all the "vectors" that
    BAT> contain the parameter values of the distribution, from
    BAT> which one wants to simulate, are of length one some
    BAT> unnecessary calculations is taken out of the for loop.

    BAT> I am not sure how much time is actually saved in this
    BAT> situation, but I belong to the school that things such
    BAT> kind of optimisation should be done. :) 

I understand, but I think most of R-core are "from the school" 
that teaches
     "if you want to optimize, first *measure*"

    BAT> If you think it bloats the code too much (or duplicates
    BAT> things too much leading to hard to maintain code), then
    BAT> feel free to ignore this second patch.

For now, I will ignore this second patch.

 - it does bloat the code slightly  (as you conceded)
 - it uses an if(<Simple>) test which makes the code slightly
   *slower* for the (probably rare) case when the <Simple> is false.

but most importantly:

 - we have no idea if the speedup (when <Simple> is TRUE)
   is in the order of  10%, 1% or 0.1%

   My guess would be '0.1%' for rnorm(), and that would
   definitely not warrant the extra check.

    >> Thank you Berwin, for your contribution!

and thanks again!
Martin

    BAT> My pleasure.

    BAT> Cheers,
    BAT> 	Berwin
G'day Martin (and "listeners"),

On Fri, 7 Mar 2008 15:01:26 +0100
[...]
    >> If you feel like finding another elegant patch...
    BAT> Well, elegance is in the eye of the beholder. :-) 

    BAT> I attach two patches.  One that adds warning messages at
    BAT> the other places where NAs can be generated.

ok. The result is very simple ``hence elegant''.

But actually, part of the changed behavior may be considered
undesirable:

  rnorm(2, mean = NA)

which gives two NaN's  would now produce a warning,
where I could argue that 
   'arithmetic with NAs should give NAs without a warning'
since
  1:2 + NA
also gives NAs without a warning.

So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.

What do others (particularly R-core !) think?
I can agree with that point of view.  But then, should a warning not
be issued only if one of the parameters of the distribution is NA, or
should it also not be issued if any non-finite parameter is
encountered?  After all,
1:2 + Inf
[1] Inf Inf

does not create any warning either.  In that case, a patch as the
attached should do, it checks all parameters for finiteness and then
checks whether the generated number is not finite.  

Thus, with the patch applied I see the following behaviour:
rnorm(2, mean=NA)
[1] NaN NaN
rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
[1] 1.897874      NaN      NaN      NaN      NaN
rbinom(2, size=20, p=1.2)
[1] NaN NaN
Warning message:
In rbinom(2, size = 20, p = 1.2) : NAs produced
rexp(2, rate=-2)
[1] NaN NaN
Warning message:
In rexp(2, rate = -2) : NAs produced

Without the patch:
rnorm(2, mean=NA)
[1] NaN NaN
rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
[1] -0.1719657        NaN        NaN        NaN        NaN
rbinom(2, size=20, p=1.2)
[1] NaN NaN
rexp(2, rate=-2)
[1] NaN NaN
Warning message:
In rexp(2, rate = -2) : NAs produced

On my machine, "make check FORCE=FORCE" passes with this patch.

[...]
For now, I will ignore this second patch.

 - it does bloat the code slightly  (as you conceded)
Fair enough. :)  I also proved my point that more complicated code is
harder to maintain.  In the two parameter case I was testing twice na
for being one instead of testing na and nb.......

[...]
but most importantly:

 - we have no idea if the speedup (when <Simple> is TRUE)
   is in the order of  10%, 1% or 0.1%

   My guess would be '0.1%' for rnorm(), and that would
   definitely not warrant the extra check.
I would not bet against this.  Probably even with all the additional
checks for finiteness of parameters there would be not much speed
difference.  The question might be whether you want to replace the
(new) R_FINITE()'s rather by ISNA()'s (or something else).  One could
also discuss in which order the checks should be made (first generated
number then parameters of distribution or vice versa).  But I will
leave this to R-core to decide. :)
    >> Thank you Berwin, for your contribution!
and thanks again!
Still my pleasure. :)

Cheers,

	Berwin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R-patch4
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20080307/bdbf223b/attachment.pl

[ ... ]

But actually, part of the changed behavior may be considered
undesirable:

 rnorm(2, mean = NA)

which gives two NaN's  would now produce a warning,
where I could argue that 
  'arithmetic with NAs should give NAs without a warning'
since
 1:2 + NA
also gives NAs without a warning.

So we could argue that a warning should *only* be produced in a
case where the parameters of the distribution are not NA.

What do others (particularly R-core !) think? 

I think the answer depends on the probability that
the user realizes that the parameter is NA.  Obviously
the user should know if 'rnorm' is used directly.  If it
is used inside a function, then the user probably doesn't
know.  Not warning in such a case means that tracking
down the ultimate cause of the problem could be harder.
However, it seems to me that the function calling 'rnorm'
is really the one responsible for warning the user.

Pat

[ ... ]
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Fri, 7 Mar 2008 23:54:06 +0800 writes:
BAT> G'day Martin (and "listeners"), On Fri, 7 Mar 2008
    BAT> 15:01:26 +0100 Martin Maechler
BAT> [...]
    >> >> If you feel like finding another elegant patch...
    >> 
    BAT> Well, elegance is in the eye of the beholder. :-)
    >> 
    BAT> I attach two patches.  One that adds warning messages
    BAT> at the other places where NAs can be generated.
    >> 
    >> ok. The result is very simple ``hence elegant''.
    >> 
    >> But actually, part of the changed behavior may be
    >> considered undesirable:
    >> 
    >> rnorm(2, mean = NA)
    >> 
    >> which gives two NaN's would now produce a warning, where
    >> I could argue that 'arithmetic with NAs should give NAs
    >> without a warning' since 1:2 + NA also gives NAs without
    >> a warning.
    >> 
    >> So we could argue that a warning should *only* be
    >> produced in a case where the parameters of the
    >> distribution are not NA.
    >> 
    >> What do others (particularly R-core !) think?

    BAT> I can agree with that point of view.  But then, should
    BAT> a warning not be issued only if one of the parameters
    BAT> of the distribution is NA, or should it also not be
    BAT> issued if any non-finite parameter is encountered?
    BAT> After all,

    >> 1:2 + Inf
    BAT> [1] Inf Inf

    BAT> does not create any warning either.  

but it doesn't give NA either.
A bit more relevant (and I'm sure you meant implicitly)

  > Inf/Inf
  [1] NaN

However, I think we shouldn't go as far.

    BAT> In that case, a patch as the attached should do, it
    BAT> checks all parameters for finiteness and then checks
    BAT> whether the generated number is not finite.

    BAT> Thus, with the patch applied I see the following
    BAT> behaviour:

    >> rnorm(2, mean=NA)
    BAT> [1] NaN NaN
    >> rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
    BAT> [1] 1.897874 NaN NaN NaN NaN
    >> rbinom(2, size=20, p=1.2)
    BAT> [1] NaN NaN Warning message: In rbinom(2, size = 20, p
    BAT> = 1.2) : NAs produced
    >> rexp(2, rate=-2)
    BAT> [1] NaN NaN Warning message: In rexp(2, rate = -2) :
    BAT> NAs produced

I've committed a bit a different patch for now (svn rev 44715),
because I felt we were getting too sophisticated.

The current (since about 20 minutes) situation is to 
warn when the *result* is NA or NaN and not to warn in any other
cases.
This maybe not be perfect, but at least is very consistent and
in particular consistent to what the warning itself "says".

BTW, I've also found that I  
     rnorm(n, mu=Inf) |--> NA  was not ok, and changed that to
     rnorm(n, mu=Inf) |--> Inf

More feedback is welcome,
but please now "start" with R-devel rev >= 44715

Martin

    BAT> Without the patch:

    >> rnorm(2, mean=NA)
    BAT> [1] NaN NaN
    >> rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
    BAT> [1] -0.1719657 NaN NaN NaN NaN
    >> rbinom(2, size=20, p=1.2)
    BAT> [1] NaN NaN
    >> rexp(2, rate=-2)
    BAT> [1] NaN NaN Warning message: In rexp(2, rate = -2) :
    BAT> NAs produced

    BAT> On my machine, "make check FORCE=FORCE" passes with
    BAT> this patch.

    BAT> [...]
    >> For now, I will ignore this second patch.
    >> 
    >> - it does bloat the code slightly (as you conceded)

    BAT> Fair enough. :) I also proved my point that more
    BAT> complicated code is harder to maintain.  In the two
    BAT> parameter case I was testing twice na for being one
    BAT> instead of testing na and nb.......

    BAT> [...]
    >> but most importantly:
    >> 
    >> - we have no idea if the speedup (when <Simple> is TRUE)
    >> is in the order of 10%, 1% or 0.1%
    >> 
    >> My guess would be '0.1%' for rnorm(), and that would
    >> definitely not warrant the extra check.

    BAT> I would not bet against this.  Probably even with all
    BAT> the additional checks for finiteness of parameters
    BAT> there would be not much speed difference.  The question
    BAT> might be whether you want to replace the (new)
    BAT> R_FINITE()'s rather by ISNA()'s (or something else).
    BAT> One could also discuss in which order the checks should
    BAT> be made (first generated number then parameters of
    BAT> distribution or vice versa).  But I will leave this to
    BAT> R-core to decide. :)

    >> >> Thank you Berwin, for your contribution!
    >> 
    >> and thanks again!

    BAT> Still my pleasure. :)

    BAT> Cheers,

    BAT> 	Berwin Index: src/main/random.c
    BAT> ===================================================================
    BAT> --- src/main/random.c (revision 44693) +++
    BAT> src/main/random.c (working copy) @@ -44,7 +44,7 @@ for
    BAT> (i = 0; i < n; i++) { ai = a[i % na]; x[i] = f(ai); -
    BAT> if (!R_FINITE(x[i])) naflag = 1; + if (!R_FINITE(x[i])
    BAT> && R_FINITE(ai)) naflag = 1; } return(naflag); } @@
    BAT> -81,6 +81,7 @@ if (na < 1) { for (i = 0; i < n; i++)
    BAT> REAL(x)[i] = NA_REAL; + warning(_("NAs produced")); }
    BAT> else { PROTECT(a = coerceVector(CADR(args), REALSXP));
    BAT> @@ -116,14 +117,14 @@ ai = a[i % na]; bi = b[i % nb];
    BAT> x[i] = f(ai, bi); - if (!R_FINITE(x[i])) naflag = 1; +
    BAT> if (!R_FINITE(x[i]) && R_FINITE(ai) && R_FINITE(bi))
    BAT> naflag = 1; } return(naflag); }

    BAT>  #define RAND2(num,name) \ case num: \ - random2(name,
    BAT> REAL(a), na, REAL(b), nb, REAL(x), n); \ + naflag =
    BAT> random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
    BAT> break

    BAT>  /* "do_random2" - random sampling from 2 parameter
    BAT> families. */ @@ -155,6 +156,7 @@ if (na < 1 || nb < 1)
    BAT> { for (i = 0; i < n; i++) REAL(x)[i] = NA_REAL; +
    BAT> warning(_("NAs produced")); } else { PROTECT(a =
    BAT> coerceVector(CADR(args), REALSXP)); @@ -200,14 +202,14
    BAT> @@ bi = b[i % nb]; ci = c[i % nc]; x[i] = f(ai, bi,
    BAT> ci); - if (!R_FINITE(x[i])) naflag = TRUE; + if
    BAT> (!R_FINITE(x[i]) && R_FINITE(ai) && R_FINITE(bi) &&
    BAT> R_FINITE(ci)) naflag = TRUE; } return(naflag); }

    BAT>  #define RAND3(num,name) \ case num: \ - random3(name,
    BAT> REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \ +
    BAT> naflag = random3(name, REAL(a), na, REAL(b), nb,
    BAT> REAL(c), nc, REAL(x), n); \ break

    BAT> @@ -244,6 +246,7 @@ if (na < 1 || nb < 1 || nc < 1) {
    BAT> for (i = 0; i < n; i++) REAL(x)[i] = NA_REAL; +
    BAT> warning(_("NAs produced")); } else { PROTECT(a =
    BAT> coerceVector(a, REALSXP));
    BAT> ______________________________________________
    BAT> R-devel at r-project.org mailing list
    BAT> https://stat.ethz.ch/mailman/listinfo/r-devel
G'day Martin and others,

On Mon, 10 Mar 2008 12:06:01 +0100

"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Fri, 7 Mar 2008 23:54:06 +0800 writes:
    BAT> After all,

    >> 1:2 + Inf
    BAT> [1] Inf Inf

    BAT> does not create any warning either.  

but it doesn't give NA either.
A bit more relevant (and I'm sure you meant implicitly)
You give me more credit than I deserve. :)
I was guided by what rexp() was doing when I chose that example, i.e.
(potentially) warning about NAs being created when actually +/- Infs
were created. In this thread I was already once or twice tempted to
comment on the appropriateness of the warning message but for various
reasonx always refrained from doing so.
BTW, I've also found that I  
     rnorm(n, mu=Inf) |--> NA  was not ok, and changed that to
     rnorm(n, mu=Inf) |--> Inf

More feedback is welcome,
but please now "start" with R-devel rev >= 44715
While I agree with this change, I think it opens a whole can of worms,
since it invites a rethink about all distributions.  At the moment
we have:

<snip>
R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
<snip>
set.seed(1) ; exp(rnorm(2))
[1] 0.5344838 1.2015872
set.seed(1) ; rlnorm(2)
[1] 0.5344838 1.2015872
set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
[1]   0 Inf
set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
[1] NaN NaN
Warning message:
In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced

The first two lines give identical results, as one could reasonably
expect.  But the other two do not and I would argue that a user could
reasonably expect that the commands in these two lines should lead to
identical results.

Likewise, coming back to the one-parameter distribution used in this
thread for illustration purposes, the *exp() functions in R are
parameterised by the rate and an exponential random variable with rate
r has mean (1/r) and variance (1/r^2).  Thus, I would argue that
rexp(2, Inf) should return 0's and not NaN's, since in the limit
a rate of Inf seems to refer to a variable with mean 0 and variance 0,
i.e. the constant 0.  And it would also be "more consistent" with the
behaviour of rexp() when the rate parameter is "large but finite".

Then one can argue about rgeom() when p=0.  But I guess in that case
the limit is a "random" variable with "infinite mean" and "infinite
variance" and hence it is o.k. to return NaNs and not Infs.  After all,
your comments in rnorm.c seem to indicate that you think that reporting
+/- Inf back is only o.k. if the mean is +/- Inf but the variance is
finite.

I did not look at (or think about) extreme cases for any other
distributions, but further discussion/feedback would indeed be helpful
it seems. :)

And the attached patch would address the behaviour of rlnorm() and
rexp() that I have raised above.  With these changes, on my machine,
"make check FORCE=FORCE" succeeds.

BTW, I was surprised to realise that the *exp() functions in the
underlying C code use the opposite parameterisation from the
corresponding functions at R level.  Perhaps it would be worthwhile to
point this out in section 6.7.1 of the Writing R extension manual?  In
particular since the manual states:

  Note that these argument sequences are (apart from the names and that
  @code{rnorm} has no @var{n}) exactly the same as the corresponding
  @R{} functions of the same name, so the documentation of the @R{}
  functions can be used.

Well, as I noticed the hard way, for *exp() the documentation of the
corresponding R functions cannot be used. ;-)

Thanks for you patience.

Cheers,

	Berwin

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R-patch
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20080311/3aa1f5d2/attachment.pl
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Tue, 11 Mar 2008 13:19:46 +0800 writes:
BAT> G'day Martin and others,
    BAT> On Mon, 10 Mar 2008 12:06:01 +0100
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
    >> >>>>>     on Fri, 7 Mar 2008 23:54:06 +0800 writes:
    >> 
    BAT> After all,
    >> 
    >> >> 1:2 + Inf
    BAT> [1] Inf Inf
    >> 
    BAT> does not create any warning either.  
    >> 
    >> but it doesn't give NA either.
    >> A bit more relevant (and I'm sure you meant implicitly)

    BAT> You give me more credit than I deserve. :)
    BAT> I was guided by what rexp() was doing when I chose that example, i.e.
    BAT> (potentially) warning about NAs being created when actually +/- Infs
    BAT> were created. In this thread I was already once or twice tempted to
    BAT> comment on the appropriateness of the warning message but for various
    BAT> reasonx always refrained from doing so.  

    >> BTW, I've also found that I  
    >> rnorm(n, mu=Inf) |--> NA  was not ok, and changed that to
    >> rnorm(n, mu=Inf) |--> Inf
    >> 
    >> More feedback is welcome,
    >> but please now "start" with R-devel rev >= 44715

    BAT> While I agree with this change, I think it opens a whole can of worms,

not a big can, though ....

    BAT> since it invites a rethink about all distributions.  At the moment
    BAT> we have:

    BAT> <snip>
    BAT> R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
    BAT> <snip>
    >> set.seed(1) ; exp(rnorm(2))
    BAT> [1] 0.5344838 1.2015872
    >> set.seed(1) ; rlnorm(2)
    BAT> [1] 0.5344838 1.2015872

    >> set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
    BAT> [1]   0 Inf
    >> set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
    BAT> [1] NaN NaN
    BAT> Warning message:
    BAT> In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced

    BAT> The first two lines give identical results, as one could reasonably
    BAT> expect.  

Yes, but I don't think a user should *rely* on the way R
generates these random number; though I agree for the very
specific case of rlnorm(..).

    BAT> But the other two do not and I would argue that a user could
    BAT> reasonably expect that the commands in these two lines should lead to
    BAT> identical results.

They now do.

    BAT> Likewise, coming back to the one-parameter distribution used in this
    BAT> thread for illustration purposes, the *exp() functions in R are
    BAT> parameterised by the rate and an exponential random variable with rate
    BAT> r has mean (1/r) and variance (1/r^2).  Thus, I would argue that
    BAT> rexp(2, Inf) should return 0's and not NaN's, since in the limit
    BAT> a rate of Inf seems to refer to a variable with mean 0 and variance 0,
    BAT> i.e. the constant 0.  And it would also be "more consistent" with the
    BAT> behaviour of rexp() when the rate parameter is "large but finite".

yes.  And that's *the* (only) case where 'Inf' parameters or 'scale = 0'
situations should work: when it is a natural limit behavior.

    BAT> Then one can argue about rgeom() when p=0.  But I guess in that case
    BAT> the limit is a "random" variable with "infinite mean" and "infinite
    BAT> variance" and hence it is o.k. to return NaNs and not Infs.  After all,
    BAT> your comments in rnorm.c seem to indicate that you think that reporting
    BAT> +/- Inf back is only o.k. if the mean is +/- Inf but the variance is
    BAT> finite.

    BAT> I did not look at (or think about) extreme cases for any other
    BAT> distributions, but further discussion/feedback would indeed be helpful
    BAT> it seems. :)

I looked at a few more, basically all  continuous
src/nmath/r<foo>.c  ones.

    BAT> And the attached patch would address the behaviour of rlnorm() and
    BAT> rexp() that I have raised above.  With these changes, on my machine,
    BAT> "make check FORCE=FORCE" succeeds.

I don't think your change to  .../R/distn.R  was good,
but the others I have more or less committed together with a few
more similar ones.

    BAT> BTW, I was surprised to realise that the *exp() functions in the
    BAT> underlying C code use the opposite parameterisation from the
    BAT> corresponding functions at R level.  Perhaps it would be worthwhile to
    BAT> point this out in section 6.7.1 of the Writing R extension manual?  In
    BAT> particular since the manual states:

    BAT> Note that these argument sequences are (apart from the names and that
    BAT> @code{rnorm} has no @var{n}) exactly the same as the corresponding
    BAT> @R{} functions of the same name, so the documentation of the @R{}
    BAT> functions can be used.

    BAT> Well, as I noticed the hard way, for *exp() the documentation of the
    BAT> corresponding R functions cannot be used. ;-)

We often also gratefully accept patches for the documentation 
...

    BAT> Thanks for you patience.
;-)

Regards,
Martin
G'day Martin,

On Tue, 11 Mar 2008 18:07:35 +0100

"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Tue, 11 Mar 2008 13:19:46 +0800 writes:
[...]
    BAT> The first two lines give identical results, as one could
    BAT> reasonably expect.  

Yes, but I don't think a user should *rely* on the way R
generates these random number; though I agree for the very
specific case of rlnorm(..).

    BAT> But the other two do not and I would argue that a user could
    BAT> reasonably expect that the commands in these two lines
    BAT> should lead to identical results.

They now do.
Well, actually I forgot to mention one could also put another argument
forward.  As the log-mean parameter of the lognormal distribution
goes to -Inf, the distribution degenerates to something that has mean 0
and variance 0, i.e. could be taken as the constant zero and, hence,
one might expect that rlnorm(1, -Inf) returns 0.

But as the log-mean parameter goes to Inf, the distribution degenerates
to something with infinite mean and infinite variance.  Thus, perhaps
it is more sensible for rlnorm(1, Inf) to return NaN instead of Inf.....
I don't think your change to  .../R/distn.R  was good,
I didn't like it either, but it was the simplest way I could think of
that would allow the C rexp() routine to realise that a scale parameter
of 0 actually came from a rate parameter of -Inf in the R code.
but the others I have more or less committed together with a few
more similar ones.
Thanks.
    BAT> BTW, I was surprised to realise that the *exp() functions in
    BAT> the underlying C code use the opposite parameterisation from
    BAT> the corresponding functions at R level.  Perhaps it would be
    BAT> worthwhile to point this out in section 6.7.1 of the Writing
    BAT> R extension manual?  In particular since the manual states:

    BAT> Note that these argument sequences are (apart from the names
    BAT> and that @code{rnorm} has no @var{n}) exactly the same as
    BAT> the corresponding @R{} functions of the same name, so the
    BAT> documentation of the @R{} functions can be used.

    BAT> Well, as I noticed the hard way, for *exp() the
    BAT> documentation of the corresponding R functions cannot be
    BAT> used. ;-)

We often also gratefully accept patches for the documentation 
I know, and I am always amazed that despite this policy (or perhaps
because of it?) the documentation of R is not patchy.... ;-)

Cheers,

	Berwin
"BAT" == Berwin A Turlach <statba at nus.edu.sg>
    on Wed, 12 Mar 2008 01:23:10 +0800 writes:
BAT> G'day Martin, On Tue, 11 Mar 2008 18:07:35 +0100 Martin
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg> >>>>>
    >> on Tue, 11 Mar 2008 13:19:46 +0800 writes:

    BAT> [...]  The first two lines give identical results, as
    BAT> one could reasonably expect.
    >> 
    >> Yes, but I don't think a user should *rely* on the way R
    >> generates these random number; though I agree for the
    >> very specific case of rlnorm(..).
    >> 
    BAT> But the other two do not and I would argue that a user
    BAT> could reasonably expect that the commands in these two
    BAT> lines should lead to identical results.
    >> 
    >> They now do.

    BAT> Well, actually I forgot to mention one could also put
    BAT> another argument forward.  As the log-mean parameter of
    BAT> the lognormal distribution goes to -Inf, the
    BAT> distribution degenerates to something that has mean 0
    BAT> and variance 0, i.e. could be taken as the constant
    BAT> zero and, hence, one might expect that rlnorm(1, -Inf)
    BAT> returns 0.

yes; and that has been a consequence of the changes I did yesterday.

    BAT> But as the log-mean parameter goes to Inf, the
    BAT> distribution degenerates to something with infinite
    BAT> mean and infinite variance.  Thus, perhaps it is more
    BAT> sensible for rlnorm(1, Inf) to return NaN instead of
    BAT> Inf.....

well.  We have pretty strictly tried to follow the principle
that 'Inf' or '-Inf'  should be used when there are clear limit

If     lim_{x \to x_0} f(x) =  +/- Inf =: M

we would set
		f(x_0) = M

for x_0 in   |R + {-Inf, Inf}

If we apply this principle here, clearly,

rlnorm(., Inf) :=  lim_{lmu -> Inf}   rlnorm(., lmu)   =
	        =  lim_{lmu -> Inf} exp(rnorm(., lmu)) =
	        =  exp(lim_{lmu -> Inf} rnorm(., lmu)) =
	        =  exp(Inf) = Inf

{ or you directly numerically see that already
  rlnorm(50, 1000)   gives all Inf }

Martin

    >> I don't think your change to .../R/distn.R was good,

    BAT> I didn't like it either, but it was the simplest way I
    BAT> could think of that would allow the C rexp() routine to
    BAT> realise that a scale parameter of 0 actually came from
    BAT> a rate parameter of -Inf in the R code.

    >> but the others I have more or less committed together
    >> with a few more similar ones.

    BAT> Thanks.

    BAT> BTW, I was surprised to realise that the *exp()
    BAT> functions in the underlying C code use the opposite
    BAT> parameterisation from the corresponding functions at R
    BAT> level.  Perhaps it would be worthwhile to point this
    BAT> out in section 6.7.1 of the Writing R extension manual?
    BAT> In particular since the manual states:
    >> 
    BAT> Note that these argument sequences are (apart from the
    BAT> names and that @code{rnorm} has no @var{n}) exactly the
    BAT> same as the corresponding @R{} functions of the same
    BAT> name, so the documentation of the @R{} functions can be
    BAT> used.
    >> 
    BAT> Well, as I noticed the hard way, for *exp() the
    BAT> documentation of the corresponding R functions cannot
    BAT> be used. ;-)
    >> 
    >> We often also gratefully accept patches for the
    >> documentation

    BAT> I know, and I am always amazed that despite this policy
    BAT> (or perhaps because of it?) the documentation of R is
    BAT> not patchy.... ;-)

    BAT> Cheers,

    BAT> 	Berwin