Back to formatted view
Raw Message

Message-ID: <1b3cd95f-3adb-4cde-b7b4-0ccc83864e65@gmail.com>
Date: 2023-09-23T16:36:55Z
From: Mikael Jagan
Subject: Recent changes to as.complex(NA_real_)
In-Reply-To: <25870.60261.24330.115632@stat.math.ethz.ch>

On 2023-09-23 9:43 am, Martin Maechler wrote:
>>>>>> Herv? Pag?s
>>>>>>      on Fri, 22 Sep 2023 16:55:05 -0700 writes:
> 
>      > The problem is that you have things that are
>      > **semantically** different but look exactly the same:
> 
>      > They look the same:
> 
>      >> x
>      > [1] NA
>      >> y
>      > [1] NA
>      >> z
>      > [1] NA
> 
>      >> is.na(x)
>      > [1] TRUE
>      >> is.na(y)
>      > [1] TRUE
>      >> is.na(z)
>      > [1] TRUE
> 
>      >> str(x)
>      >  ?cplx NA
>      >> str(y)
>      >  ?num NA
>      >> str(z)
>      >  ?cplx NA
> 
>      > but they are semantically different e.g.
> 
>      >> Re(x)
>      > [1] NA
>      >> Re(y)
>      > [1] -0.5? # surprise!
> 
>      >> Im(x)? # surprise!
>      > [1] 2
>      >> Im(z)
>      > [1] NA
> 
>      > so any expression involving Re() or Im() will produce
>      > different results on input that look the same on the
>      > surface.
> 
>      > You can address this either by normalizing the internal
>      > representation of complex NA to always be complex(r=NaN,
>      > i=NA_real_), like for NA_complex_, or by allowing the
>      > infinite variations that are currently allowed and at the
>      > same time making sure that both Re() and Im()? always
>      > return NA_real_ on a complex NA.
> 
>      > My point is that the behavior of complex NA should be
>      > predictable. Right now it's not. Once it's predictable
>      > (with Re() and Im() both returning NA_real_ regardless of
>      > internal representation), then it no longer matters what
>      > kind of complex NA is returned by as.complex(NA_real_),
>      > because they are no onger distinguishable.
> 
>      > H.
> 
>      > On 9/22/23 13:43, Duncan Murdoch wrote:
>      >> Since the result of is.na(x) is the same on each of
>      >> those, I don't see a problem.? As long as that is
>      >> consistent, I don't see a problem. You shouldn't be using
>      >> any other test for NA-ness.? You should never be
>      >> expecting identical() to treat different types as the
>      >> same (e.g.  identical(NA, NA_real_) is FALSE, as it
>      >> should be).? If you are using a different test, that's
>      >> user error.
>      >>
>      >> Duncan Murdoch
>      >>
>      >> On 22/09/2023 2:41 p.m., Herv? Pag?s wrote:
>      >>> We could also question the value of having an infinite
>      >>> number of NA representations in the complex space. For
>      >>> example all these complex values are displayed the same
>      >>> way (as NA), are considered NAs by is.na(), but are not
>      >>> identical or semantically equivalent (from an Re() or
>      >>> Im() point of view):
>      >>>
>      >>> ? ??? NA_real_ + 0i
>      >>>
>      >>> ? ??? complex(r=NA_real_, i=Inf)
>      >>>
>      >>> ? ??? complex(r=2, i=NA_real_)
>      >>>
>      >>> ? ??? complex(r=NaN, i=NA_real_)
>      >>>
>      >>> In other words, using a single representation for
>      >>> complex NA (i.e.  complex(r=NA_real_, i=NA_real_)) would
>      >>> avoid a lot of unnecessary complications and surprises.
>      >>>
>      >>> Once you do that, whether as.complex(NA_real_) should
>      >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_,
>      >>> i=NA_real_) becomes a moot point.
>      >>>
>      >>> Best,
>      >>>
>      >>> H.
> 
> Thank you, Herv?.
> Your proposition is yet another one,
> to declare that all complex NA's should be treated as identical
> (almost/fully?) everywhere.
> 
> This would be a possibility, but I think a drastic one.
> 
> I think there are too many cases, where I want to keep the
> information of the real part independent of the values of the
> imaginary part (e.g. think of the Riemann hypothesis), and
> typically vice versa.
> 
> With your proposal, for a (potentially large) vector of complex numbers,
> after
>        Re(z)  <-  1/2
> 
> I could no longer rely on   Re(z) == 1/2,
> because it would be wrong for those z where (the imaginary part/ the number)
> was NA/NaN.
> Also, in a similar case, a
> 
>        Im(z) <- NA
> 
> would have to "destroy" all real parts  Re(z);
> not really typically in memory, but effectively for the user,  Re(z)
> would be all NA/NaN.
> 
> And I think there are quite a few other situations
> where looking at Re() and Im() separately makes a lot of sense.

Indeed, and there is no way to "tell" BLAS and LAPACK to treat both the real and
imaginary parts as NA_REAL when either is NA_REAL.  Hence the only reliable way
to implement such a proposal would be to post-process the result of any
computation returning a complex type, testing for NA_REAL and setting both parts
to NA_REAL in that case.  My expectation is that such testing would drastically
slow down basic arithmetic and algebraic operations ...

Mikael

> 
> Spencer also made a remark in this direction.
> 
> All in all I'd be very reluctant to move in this direction;
> but yes, I'm just one person ... let's continue musing and
> considering !
> 
> Martin
> 
>      >>> On 9/22/23 03:38, Martin Maechler wrote:
>      >>>>>>>>> Mikael Jagan ????? on Thu, 21 Sep 2023 00:47:39
>      >>>>>>>>> -0400 writes:
>      >>>> ????? > Revisiting this thread from April:
>      >>>>
>      >>>> >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html
>      >>>>
>      >>>> ????? > where the decision (not yet backported) was
>      >>>> made for ????? > as.complex(NA_real_) to give
>      >>>> NA_complex_ instead of ????? > complex(r=NA_real_,
>      >>>> i=0), to be consistent with ????? > help("as.complex")
>      >>>> and as.complex(NA) and as.complex(NA_integer_).
>      >>>>
>      >>>> ????? > Was any consideration given to the alternative?
>      >>>> ????? > That is, to changing as.complex(NA) and
>      >>>> as.complex(NA_integer_) to ????? > give
>      >>>> complex(r=NA_real_, i=0), consistent with ????? >
>      >>>> as.complex(NA_real_), then amending help("as.complex")
>      >>>> ????? > accordingly?
>      >>>>
>      >>>> Hmm, as, from R-core, mostly I was involved, I admit to
>      >>>> say "no", to my knowledge the (above) alternative
>      >>>> wasn't considered.
>      >>>>
>      >>>> ??? > The principle that ??? >
>      >>>> Im(as.complex(<real=(double|integer|logical)>)) should
>      >>>> be zero ??? > is quite fundamental, in my view, hence
>      >>>> the "new" behaviour ??? > seems to really violate the
>      >>>> principle of least surprise ...
>      >>>>
>      >>>> of course "least surprise"? is somewhat subjective.
>      >>>> Still, I clearly agree that the above would be one
>      >>>> desirable property.
>      >>>>
>      >>>> I think that any solution will lead to *some* surprise
>      >>>> for some cases, I think primarily because there are
>      >>>> *many* different values z? for which? is.na(z)? is
>      >>>> true,? and in any case NA_complex_? is only of the
>      >>>> many.
>      >>>>
>      >>>> I also agree with Mikael that we should reconsider the
>      >>>> issue that was raised by Davis Vaughan here ("on
>      >>>> R-devel") last April.
>      >>>>
>      >>>> ????? > Another (but maybe weaker) argument is that
>      >>>> ????? > double->complex coercions happen more often
>      >>>> than ????? > logical->complex and integer->complex
>      >>>> ones. Changing the ????? > behaviour of the more
>      >>>> frequently performed coercion is ????? > more likely to
>      >>>> affect code "out there".
>      >>>>
>      >>>> ????? > Yet another argument is that one expects
>      >>>>
>      >>>> ????? >????? identical(as.complex(NA_real_), NA_real_ +
>      >>>> (0+0i))
>      >>>>
>      >>>> ????? > to be TRUE, i.e., that coercing from double to
>      >>>> complex is ????? > equivalent to adding a complex
>      >>>> zero.? The new behaviour ????? > makes the above FALSE,
>      >>>> since NA_real_ + (0+0i) gives ????? >
>      >>>> complex(r=NA_real_, i=0).
>      >>>>
>      >>>> No!? --- To my own surprise (!) --- in current R-devel
>      >>>> the above is TRUE, and ??????? NA_real_ + (0+0i)? , the
>      >>>> same as ??????? NA_real_ + 0i????? , really gives
>      >>>> complex(r=NA, i=NA) :
>      >>>>
>      >>>> Using showC() from ?complex
>      >>>>
>      >>>> ??? showC <- function(z) noquote(sprintf("(R = %g, I =
>      >>>> %g)", Re(z), Im(z)))
>      >>>>
>      >>>> we see (in R-devel) quite consistently
>      >>>>
>      >>>>> showC(NA_real_ + 0i)
>      >>>> [1] (R = NA, I = NA)
>      >>>>> showC(NA?????? + 0i)? # NA is 'logical'
>      >>>> [1] (R = NA, I = NA) where as in R 4.3.1 and
>      >>>> "R-patched" -- *in*consistently
>      >>>>
>      >>>>> showC(NA_real_ + 0i)
>      >>>> [1] (R = NA, I = 0)
>      >>>>> showC(NA + 0i)
>      >>>> [1] (R = NA, I = NA) .... and honestly, I do not see
>      >>>> *where* (and when) we changed the underlying code (in
>      >>>> arithmetic.c !?)? in R-devel to *also* produce
>      >>>> NA_complex_? in such complex *arithmetic*
>      >>>>
>      >>>>
>      >>>> ????? > Having said that, one might also (but more
>      >>>> naively) expect
>      >>>>
>      >>>> ????? >
>      >>>> identical(as.complex(as.double(NA_complex_)),
>      >>>> NA_complex_)
>      >>>>
>      >>>> ????? > to be TRUE.
>      >>>>
>      >>>> as in current R-devel
>      >>>>
>      >>>> ????? > Under my proposal it continues to be FALSE.
>      >>>>
>      >>>> as in "R-release"
>      >>>>
>      >>>> ????? > Well, I'd prefer if it gave FALSE with a
>      >>>> warning ????? > "imaginary parts discarded in
>      >>>> coercion", but it seems that ????? >
>      >>>> as.double(complex(r=a, i=b)) never warns when either of
>      >>>> ????? > 'a' and 'b' is NA_real_ or NaN, even where
>      >>>> "information" ????? > {nonzero 'b'} is clearly lost ...
>      >>>>
>      >>>> The question of *warning* here is related indeed, but I
>      >>>> think we should try to look at it only *secondary* to
>      >>>> your first proposal.
>      >>>>
>      >>>> ????? > Whatever decision is made about
>      >>>> as.complex(NA_real_), ????? > maybe these points should
>      >>>> be weighed before it becomes part of ????? > R-release
>      >>>> ...
>      >>>>
>      >>>> ????? > Mikael
>      >>>>
>      >>>> Indeed.
>      >>>>
>      >>>> Can we please get other opinions / ideas here?
>      >>>>
>      >>>> Thank you in advance for your thoughts!  Martin
>      >>>>
>      >>>> ---
>      >>>>
>      >>>> PS:
>      >>>>
>      >>>> ?? Our *print()*ing? of complex NA's ("NA" here meaning
>      >>>> NA or NaN) ?? is also unsatisfactory, e.g. in the case
>      >>>> where all entries of a ?? vector are NA in the sense of
>      >>>> is.na(.), but their ?? Re() and Im() are not all NA:
>      >>>> ??? showC <- function(z) noquote(sprintf("(R = %g, I =
>      >>>> %g)", Re(z), Im(z))) ??? z <- complex(, c(11, NA, NA),
>      >>>> c(NA, 99, NA)) ??? z ??? showC(z)
>      >>>>
>      >>>> gives
>      >>>>
>      >>>> ??? > z ??? [1] NA NA NA ??? > showC(z) ??? [1] (R =
>      >>>> 11, I = NA) (R = NA, I = 99) (R = NA, I = NA)
>      >>>>
>      >>>> but that (printing of complex) *is* another issue, in
>      >>>> which we have the re-opened bugzilla PR#16752
>      >>>> ==>https://bugs.r-project.org/show_bug.cgi?id=16752
>      >>>>
>      >>>> on which we also worked during the R Sprint in Warwick
>      >>>> three weeks ago, and where I want to commit changes in
>      >>>> any case {but think we should change even a bit more
>      >>>> than we got to during the Sprint}.
>      >>>>
>      >>>> ______________________________________________
>      >>>> R-devel at r-project.org? mailing list
>      >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>      >>>
>      >>
>      > --
>      > Herv? Pag?s
> 
>      > Bioconductor Core Team hpages.on.github at gmail.com
> 
> 
>