Skip to content
Prev 244502 / 398502 Next

Encoding problem - I fails to read Hebrew text from online

Tal, 

OK, let me clarify my understanding. The original and decoded file are
text, encoded by UTF-8. In the original file, there are HTML `entities'
that represent UTF-8 Hebrew characters. In the decoded file, the
entities are converted to UTF-8 characters. The question is how to
convert these entities within R. It's not the same as converting between
character encodings, otherwise iconv() might offer a solution.

I'll have a look around to find a solution, and I hope others will too.
My first idea is to check RCurl, XML, and the related utils::URLdecode.
If there really is no existing solution, I think it might be worthwhile
to look at how PHP and Python do it (and maybe borrow some code :) ).

-Matt
On Thu, 2010-12-09 at 14:27 -0500, Tal Galili wrote: