Skip to content
Prev 60508 / 63421 Next

deparse() and UTF-8 strings

I'm not R-core, but happen to have run into this issue.

I think this makes sense conceptually, and have had the same thought 
myself.  One implementation challenge is that the parser has a special 
branch for Unicode escape strings (e.g. "G\u00e1bor") that limits such 
input to 10K wide characters, so the parser would need to be modified in 
order to make this a general solution:

 > parse(text=sprintf('"%s"', strrep("G\\u00e1bor", 2000)))
Error in parse(text = sprintf("\"%s\"", strrep("G\\u00e1bor", 2000))) :
   string at line 1 containing Unicode escapes not in this locale
is too long (max 10000 chars)

Such strings are rare so maybe an interim solution is just to allow it 
for deparsing of shorter strings.  The parser modification itself would 
also have the benefit of speeding up parsing of strings without Unicode 
escapes.

Best,

B.
On 2/21/22 5:33 AM, G?bor Cs?rdi wrote: