Skip to content

unicode&pdf font problem

8 messages · Tóth Dénes, Sascha Vieweg, David Winsemius

#
Dear List,

I would like to print a plot into pdf. The problem is that the character
\U0171 is replaced by a simple 'u' (i.e. without accents) in the pdf file.

Example:
# this works fine
plot(1,type="n")
text(1,1,"print \U0171")

# this fails
pdf("trial.pdf")
plot(1,type="n")
text(1,1,"print \U0171")
dev.off()

I found an earlier post at
http://www.mail-archive.com/r-help at r-project.org/msg65541.html, but it is
too hard to understand at my R-level. Any help is appreciated.

Regards,
  Denes
#
On Jan 12, 2011, at 11:11 PM, tdenes at cogpsyphy.hu wrote:

            
Have you tried:

pdf("trial.pdf")
plot(1,type="n")
text(1,1,"print ?")
dev.off()

Your default screen fonts may not be the same as your default pdf  
fonts. A lot depends on system specifics, none of which have you  
provided.
David Winsemius, MD
West Hartford, CT
#
Hi!

Sorry for the missing specs, here they are:
_
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          12.1
year           2010
month          12
day            16
svn rev        53855
language       R
version.string R version 2.12.1 (2010-12-16)

OS: Windows 7 (English version, 32 bit)



Note that \U0171 != ?. See
http://www.fileformat.info/info/unicode/char/171/index.htm
Anyway, I have no problem with ű (~u") and other special Hungarian
characters in my R-Gui. It is correctly displayed in the console, in
plots, etc. The problem is with the pdf conversion.

The same holds for my Ubuntu Hardy Heron system*, with exactly the same
error messages as reported in an earlier thread
http://www.mail-archive.com/r-help at r-project.org/msg89792.html
As far as I know, Hershey fonts do not contain \U0171.


Regards,
Denes

* The specs of Ubuntu:
_
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          2
minor          12.0
year           2010
month          10
day            15
svn rev        53317
language       R
version.string R version 2.12.0 (2010-10-15)
#
On Jan 13, 2011, at 7:01 AM, tdenes at cogpsyphy.hu wrote:

            
You are after what Adobe calls: udblacute; 0171.  It is recognized in  
the list of adobe glyphs:
 >  str(tools::Adobe_glyphs[371, ])
'data.frame':	1 obs. of  2 variables:
  $ adobe  : chr "udblacute"
  $ unicode: chr "0171"

Consulted the help pages
points {graphics}
postscript {grDevices}
pdf {grDevices}
charsets {tools}
postscriptFonts {grDevices}

I have tried a variety of the pdfFonts installed on my Mac without  
success. You can perhaps make a list of fonts on your machines with  
names(pdfFonts()). Perhaps the range of fonts and the glyphs they  
contain is different on your machines. I get consistently warning  
messages saying there is a conversion failure:

 > pdf("trial.pdf", family="Helvetica")
# also tried with font="Helvetica" but I think that is erroneous
 > plot(1,type="n")
 > text(1,1,"print \U0170\U0171")
Warning messages:
1: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <c5>
2: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <b0>
3: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <c5>
4: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <b1>
5: In text.default(1, 1, "print ??") :
   font metrics unknown for Unicode character U+0170
6: In text.default(1, 1, "print ??") :
   font metrics unknown for Unicode character U+0171
7: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <c5>
8: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <b0>
9: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <c5>
10: In text.default(1, 1, "print ??") :
   conversion failure on 'print ??' in 'mbcsToSbcs': dot substituted  
for <b1>

And this is despite my system saying the \U0170 and \U0171 are present  
in the Helvetica font. Also tried family=URWHelvetica and  
family=NimbusSanand and a bunch of others without success, but my last  
best hope after reading the material in help(postscript) in the  
"Families" section had been NimbusSan.  There is also information on  
that page regarding encodings that appears to be very machine specific.
David Winsemius, MD
West Hartford, CT
#
Dear David,

Thank you for your efforts. Inspired by your remarks, I started a new
google-search and found this:
http://stackoverflow.com/questions/3434349/sweave-not-printing-localized-characters

SO HERE COMES THE SOLUTION (it works on both OSs):

pdf.options(encoding = "CP1250")
pdf()
plot(1,type="n")
text(1,1,"\U0171")
dev.off()

CP1250 should work for all Central-European languages:
http://en.wikipedia.org/wiki/Windows-1250


Thank you again,
  Denes
#
I have many German umlauts in my data sets and code them UTF-8. 
When it comes to plotting on pdf, I figured out that "CP1257" is a 
good choice to output Umlauts. I have no experiences with 
"CP1250", but maybe this small hint helps:

pdf(file=paste(sharepath, "/filename.pdf", sep=""), 9, 6, 
pointsize = 11, family = "Helvetica", encoding = "CP1257")

*S*
On 11-01-13 16:17, tdenes at cogpsyphy.hu wrote:

            

  
    
#
Good work, Denes;

Setting encodings to CP1250 in the pdf call allows the Hungarian  
umlaut glyph to be printed to a pdf document on Macs as well, which by  
the way uses a default postscript/pdf family="Helvetica".
#
On Jan 13, 2011, at 10:41 AM, Sascha Vieweg wrote:

            
Just an FYI for the archives, that encoding fails with  
pdf(encoding="CP1257") on a Mac when printing that target umlaut.

David.
David Winsemius, MD
West Hartford, CT