Skip to content

Plotting the ASCII character set.

7 messages · Rolf Turner, David Winsemius, Ivan Krylov +1 more

#
I have (used to have?) a function plot_ascii() which would display the
ascii character set in a graphical display.  It simply used text() to
place the symbols on a 16 x 16 grid.  The labels used by text() were
taken from a character vector that I called "all.ascii".  According to
my notes, the entries of this vector were obtained from a posting to
R-help made by the redoubtable Martin Maechler back in 2002.

This function *used* to work!  Now it doesn't.  When I invoke
plot.ascii() I get an error:
To give a simple example, just looking at *one* of the characters,
which comes from the string "\260" in my data file:

    a <- "\260"
    plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
    text(0.5,0.5,labels=a)

Same error.  If I type the name a I get "\xb0", which I don't
understand.  Can't get my head around character encoding.

If I do

    plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE)
    text(0.5,0.5,labels="\ub0")

then I get the degree symbol plotted; I guess that b0 is the hex
encoding of the degree symbol; apparently 260 is the octal encoding of
this symbol.

Can anyone suggest how I might get my plot_ascii() function working
again?  Basically, it seems to me, the question is:  how do I persuade
R to read in "\260" as "\ub0" rather than "\xb0"?

I hope for enlightenment! :-)

cheers,

Rolf Turner

P.S. My sessionInfo() may be relevant:

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brev_0.0-5

loaded via a namespace (and not attached):
 [1] magrittr_1.5      usethis_2.0.1     devtools_2.4.2    pkgload_1.2.1    
 [5] R6_2.4.1          rlang_0.4.11      mixreg_1.0-1      fastmap_1.0.1    
 [9] tools_4.1.0       pkgbuild_1.2.0    sessioninfo_1.1.1 cli_2.5.0        
[13] withr_2.4.2       ellipsis_0.3.2    remotes_2.4.0     rprojroot_1.3-2  
[17] lifecycle_1.0.0   crayon_1.3.4      processx_3.5.2    purrr_0.3.4      
[21] callr_3.7.0       fs_1.5.0          ps_1.6.0          testthat_3.0.3   
[25] memoise_2.0.0     glue_1.4.0        cachem_1.0.5      compiler_4.1.0   
[29] desc_1.3.0        backports_1.1.6   prettyunits_1.1.1
#
Hello Rolf Turner,

On Sat, 3 Jul 2021 14:02:59 +1200
Rolf Turner <r.turner at auckland.ac.nz> wrote:

            
Part of the problem is that the "\xb0" byte is not in ASCII, which
covers only the lower half of possible 8-bit bytes. I guess that the
strings containing bytes with highest bit set used to be interpreted as
Latin-1 on your machine, but now get interpreted as UTF-8, which
changes their meaning (in UTF-8, the highest bit being set indicates
that there will be more bytes to follow, making the string invalid if
there is none).

The good news is, since it's Latin-1, which is natively supported by R,
there are even multiple options:

1. Mark the string as Latin-1 by setting Encoding(a) <- 'latin1' and
let R do the re-encoding if and when Pango asks it for a UTF-8-encoded
string.

2. Decode Latin-1 into the locale encoding by using iconv(a, 'latin1',
'') (or set the third parameter to 'UTF-8', which would give almost the
same result on a machine with a UTF-8 locale). The result is, again, a
string where Encoding(a) matches the truth. Explicitly setting UTF-8
may be preferable on Windows machines running pre-UCRT builds of R
where the locale encoding may not contain all Latin-1 characters, but
that's not a problem for you, as far as I know.

For any encoding other than Latin-1 or UTF-8, option (2) is still valid.

I have verified that your example works on my GNU/Linux system with a
UTF-8 locale if I use either option.
#
On Sat, 3 Jul 2021 09:40:28 +0200
Ivan Krylov <krylov.r00t at gmail.com> wrote:

            
Thanks Ivan. That solves most of the problem, but there are still
glitches. I get a plot OK, but a substantial number of the characters
are displayed as a wee rectangle containing a 2 x 2 array of digits
such as
Also note that there is a bit of difference between the results of using
Encoding() and the results of using iconv(). E.g. if I do

a <- "\x80"
b <- iconv(a,"latin1","UTF-8")
Encoding(a) <- "latin1"

then when I type "a" I get the Euro symbol "?", but when I type "b"
I get the string "\u0080".

But that doesn't really matter.  More problematic is the fact that if I
do either

    plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
    text(0.5,0.5,labels=a,cex=6)
or

    plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
    text(0.5,0.5,labels=b,cex=6)

then I get wee rectangle with 0 0 8 0 arranged in a 2 x 2 array inside.
(Setting cex=6 makes it easier for my ageing eyes to see what the
digits are.)

Is there any way that I can get the Euro symbol to display correctly in
such a graphic?

Thanks.

cheers,

Rolf
#
Sent from my iPhone
Pick a font that is supported on your OS that has the desired glyph. 
Also look at the examples in:

?points

? 
David
#
On Sun, 4 Jul 2021 13:59:49 +1200
Rolf Turner <r.turner at auckland.ac.nz> wrote:

            
Interesting. I didn't pay attention to it at first, but now I see that
a range of code points, U+0080 to U+009F, corresponds to control
characters (also, 0+00A0 is non-breakable space), not anything
printable. Also, Latin-1 doesn't define any meaning for bytes
0x80..0x9f, but here they are decoded to same-valued Unicode code
points. And the actual code point for ? is U+20AC, not even close to
what we're working with.
You are right. I didn't know that, but my reading of the function
translateToNative in src/main/sysutils.c suggests that R decodes
strings marked as 'latin1' as Windows-1252 (if it's available for the
system iconv()) and uses the actual Latin-1 as a fallback.

?Encoding does warn that 'latin1' is ambiguous and system-dependent
with regards to bytes 0x80..0x9f, so text() seems to be right to use
Latin-1 and not Windows-1252 when trying to plot byte 0x80 encoded as
CE_LATIN1 as U+0080. Although there's a /* FIXME: allow CP1252? */
comment in src/main/sysutils.c, function reEnc, which is used by text().
I think that iconv(a, 'CP1252', '', '\ufffd') should work for you. At
least it seems to work for the ? sign. It does leave the following
bytes undefined, represented as ? U+FFFD REPLACEMENT CHARACTER:

as.raw(which(is.na(
 iconv(sapply(as.raw(1:255), rawToChar), 'CP1252', '')
)))
# [1] 81 8d 8f 90 9d

Not sure what can be done about those. With Latin-1, they would
correspond to unprintable control characters anyway.
#
On 03/07/2021 9:59 p.m., Rolf Turner wrote:
... deletia ...
The problem with the Euro symbol is that it was invented after the first 
8 bit encodings, so it was stuck in later.  If you want it, this seems 
helpful:

 From https://web.stanford.edu/~laurik/fsmbook/faq/utf8.html:

"The proper Unicode code point for ? [this may or may not display 
correctly as the Euro sign in your browser] is decimal 8364 (0x20AC). In 
Windows CP1252 ? has the code 128 (0x80); in ISO-8859-15 (also known as 
Latin-9) the ? code is 164 (0xA4); in Macintosh Roman it is 219 (0xDB)."

So a fairly portable way to display it would be "\u20ac".  That works in 
a plot on my Mac; on other graphics devices it depends on whether the 
glyph is defined, but I'd expect it is fairly widespread.

The "\x80" character varies across 8 bit encodings.  In many of them 
it's a non-printable character, but not on Windows.

Duncan Murdoch
3 days later
#
Thanks to Ivan Krylov, David Winsemius and Duncan Murdoch for their
informative replies to my cri de coeur.  The most complete answer was
however provided off-list by Andrew Simmons who wrote a new and
carefully structured function plotASCII() to replace my old no-longer
functioning plot_ascii() function.

In the belief that others on the list might well be interested in seeing
Andrew's very elegant solution to my problem, I have attached (with
Andrew's permission) the code for plotASCII() (in the file
"plotASCII.txt").

Note that Andrew, very cautiously, uses syntax such as
"graphics::text(<whatever>) rather than just text(<whatever>).
I guess it never hurts to be cautious.

cheers,

Rolf Turner