Skip to content

Accented characters, windows

3 messages · Duncan Murdoch, derek

#
I have problem with accented characters. My OS is Win 8.1 and I'm using
RStudio.

I make string :
av="?????"

When I call "av" I get result bellow.
[1] "?????"

The resulting characters are different. I have similar problem when I write
string to a file. In RGUI if I call "av" it prints characters correctly,
but using "write" function to print string in a file results in the same
problem.

Can you please help me how to deal with it?
#
On 29/03/2016 5:39 PM, Jan Kacaba wrote:
You don't say what code page you're using.

R in Windows has a long standing problem that it works mainly in the 
local code page, rather than working in UTF-8 as most other systems do. 
  (This is due to the fact that when the internationalization was put 
in, UTF-8 was exotic, rather than ubiquitous as it is now.)  So R can 
store UTF-8 strings on any system, but for display it converts them to 
the local code page, and that conversion can lose information if the 
characters aren't supported locally.

With your string, I don't see the same thing as you, I see

"e?cr?"

which is also incorrect, but looks a little closer, because it does a 
better approximation in my code page.

So if you think my result is better than yours, you could change your 
system to code page 437 as I'm using, but that will probably cause you 
worse problems.

Probably the only short term solution that would be satisfactory is to 
stop using Windows.  At some point in the future the internal character 
handling in R needs an overhaul, but that's a really big, really 
thankless job.  Perhaps Microsoft/Revolution will donate some programmer 
time to do it, but more likely, it will wait for volunteers in R Core to 
do it.  I don't think it will happen in 2016.

Duncan Murdoch
#
Duncun, thank you for your reply. My encoding is:
[1] "Czech_Czech Republic.1250"

In RStudio I use UTF-8. I tried also other recommended encodings but some
characters are still misrepresented.

I've found solution to this. To correctly display strings in RStudio I have
to convert strings:
iconv(x,"CP1250","UTF-8")

If I want to write string into file:
zz=file("myfile.txt", "w", encoding="UTF-8")
cat(x,file = zz, sep = "\n")

It seems there is no need using icon() if I just need to write string to a
file.

I hope there is no problem processing strings with other functions like
paste, strsplit, grep though.

Derek

2016-03-30 0:56 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com>: