R-devel: as.character() for hexmode no longer pads with zeros - R-devel

Wed, Sep 22, 2021 8:48 PM #

The update in rev 80946
(https://github.com/wch/r-source/commit/d970867722e14811e8ba6b0ba8e0f478ff482f5e)
caused as.character() on hexmode objects to no longer pads with zeros.

Before:

[1] "00" "08" "10" "18" "20"

[1] "00" "08" "10" "18" "20"

After:

[1] "00" "08" "10" "18" "20"

[1] "0"  "8"  "10" "18" "20"

Was that intended?

/Henrik

PS. This breaks R.utils::intToHex()
[https://cran.r-project.org/web/checks/check_results_R.utils.html]

Martin Maechler

Thu, Sep 23, 2021 12:46 AM #

> The update in rev 80946
    > (https://github.com/wch/r-source/commit/d970867722e14811e8ba6b0ba8e0f478ff482f5e)
    > caused as.character() on hexmode objects to no longer pads with zeros.

Yes -- very much on purpose; by me, after discussing a related issue
within R-core which showed "how wrong" the previous (current R)
behavior of the as.character() method is for
hexmode and octmode objects :

If you look at the whole rev 80946 , you also read NEWS

 * as.character(<obj>) for "hexmode" or "octmode" objects now
   fulfills the important basic rule

  as.character(x)[j] === as.character(x[j]) 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

rather than just calling format().

The format() generic (notably for "atomic-alike" objects) should indeed
return a character vector where each string has the same "width",
however, the result of  as.character(x) --- at least for all
"atomic-alike" / "vector-alike" objects --
for a single x[j] should not be influenced by other elements in x.




    > Before:

    >> x <- structure(as.integer(c(0,8,16,24,32)), class="hexmode")
    >> x
    > [1] "00" "08" "10" "18" "20"
    >> as.character(x)
    > [1] "00" "08" "10" "18" "20"

    > After:

    >> x <- structure(as.integer(c(0,8,16,24,32)), class="hexmode")
    >> x
    > [1] "00" "08" "10" "18" "20"
    >> as.character(x)
    > [1] "0"  "8"  "10" "18" "20"

    > Was that intended?

Yes!
You have to explore your example a bit to notice how "illogical"
the behavior before was:

[1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "a" "b" "c" "d" "e" "f"

[1] "00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "0a" "0b" "0c" "0d" "0e"
[16] "0f" "10"

[1] "001" "010" "100"

[1] "0001" "0010" "0100" "1000"

[1] "00001" "00010" "00100" "01000" "10000"

all breaking the rule in the NEWS  and given above.

If you want format()  you should use format(),
but as.character() should never have used format() ..

Martin
    
    > /Henrik

    > PS. This breaks R.utils::intToHex()
    > [https://cran.r-project.org/web/checks/check_results_R.utils.html]

Henrik Bengtsson

Thu, Sep 23, 2021 9:55 AM #

Thanks for confirming and giving details on the rationale (... and
I'll updated R.utils to use format() instead).

Regarding as.character(x)[j] === as.character(x[j]): I agree with this
- is that property of as.character()/subsetting explicitly
stated/documented somewhere?  I wonder if this is a property we should
all strive for for other types of objects?

/Henrik

On Thu, Sep 23, 2021 at 12:46 AM Martin Maechler

<maechler at stat.math.ethz.ch> wrote:

Henrik Bengtsson
    on Wed, 22 Sep 2021 20:48:05 -0700 writes:

    > The update in rev 80946
    > (https://github.com/wch/r-source/commit/d970867722e14811e8ba6b0ba8e0f478ff482f5e)
    > caused as.character() on hexmode objects to no longer pads with zeros.

Yes -- very much on purpose; by me, after discussing a related issue
within R-core which showed "how wrong" the previous (current R)
behavior of the as.character() method is for
hexmode and octmode objects :

If you look at the whole rev 80946 , you also read NEWS

 * as.character(<obj>) for "hexmode" or "octmode" objects now
   fulfills the important basic rule

  as.character(x)[j] === as.character(x[j])
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

rather than just calling format().

The format() generic (notably for "atomic-alike" objects) should indeed
return a character vector where each string has the same "width",
however, the result of  as.character(x) --- at least for all
"atomic-alike" / "vector-alike" objects --
for a single x[j] should not be influenced by other elements in x.

    > Before:

    >> x <- structure(as.integer(c(0,8,16,24,32)), class="hexmode")
    >> x

    > [1] "00" "08" "10" "18" "20"

    >> as.character(x)

    > [1] "00" "08" "10" "18" "20"

    > After:

    >> x <- structure(as.integer(c(0,8,16,24,32)), class="hexmode")
    >> x

    > [1] "00" "08" "10" "18" "20"

    >> as.character(x)

    > [1] "0"  "8"  "10" "18" "20"

    > Was that intended?

Yes!
You have to explore your example a bit to notice how "illogical"
the behavior before was:

as.character(as.hexmode(0:15))

 [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "a" "b" "c" "d" "e" "f"

as.character(as.hexmode(0:16))

 [1] "00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "0a" "0b" "0c" "0d" "0e"
[16] "0f" "10"

as.character(as.hexmode(16^(0:2)))

[1] "001" "010" "100"

as.character(as.hexmode(16^(0:3)))

[1] "0001" "0010" "0100" "1000"

as.character(as.hexmode(16^(0:4)))

[1] "00001" "00010" "00100" "01000" "10000"

all breaking the rule in the NEWS  and given above.

If you want format()  you should use format(),
but as.character() should never have used format() ..

Martin

    > /Henrik

    > PS. This breaks R.utils::intToHex()
    > [https://cran.r-project.org/web/checks/check_results_R.utils.html]