Skip to content

Error on Windows build: "unable to re-encode"

4 messages · Felix Schönbrodt, Duncan Murdoch

#
On 26/02/2010 11:05 AM, Felix Sch?nbrodt wrote:
I got the same error as you.  It looks as though iconv has trouble with 
the way some characters are encoded in your file.  For example, on line 
893, you have a u-umlaut encoded as EF BF BD.  According the the UTF-8 
tables at 
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280, that 
encodes a question mark in a diamond, "REPLACEMENT CHARACTER".  There's 
no corresponding character in the standard Windows latin1 encoding, so 
conversion fails.  Firefox can display the funny question mark, but it 
doesn't display the u-umlaut as you intended, so I think this is an 
error in your file.

A way to find all such errors is as follows:  read the file as utf-8, 
then use the iconv() function in R to convert it to latin1.  When I do 
that, I get NA on lines 893 and 953, which are displayed to me as

[1] "\t# im latenten Fall: die Error variance erst am Ende berechnen 
(d.h., alle error componenten ???ber alle Gruppen mitteln, die unter 
NUll auf Null setzen, dann addieren)"
[2] "\t\t# TODO: ???berpr???fen!"    

We might be able to make the error message in the package installer more 
informative (e.g. giving the line number that failed).  I'll look into that.

Duncan Murdoch
#
Thanks for your help - that was the solution (easy enough to remove these two characters - they've been in only comments anyway).
Fortunately, the DECRIPTION file accepts umlauts, as in my second name. The problem was only in the source file.

Felix


Am 26.02.2010 um 18:37 schrieb Duncan Murdoch:
#
Felix Sch?nbrodt wrote:
I think comments in R code could also include umlauts, but they need to 
be encoded in a way that can be converted to Latin1 on Windows.  I don't 
know why yours weren't.  Did those characters look like u-umlaut on your 
system?  What editor did you use to produce that file?

I'm not sure what the consequences would be of allowing unrepresentable 
characters to be mapped to question marks or hex codes (with a 
warning).  I think it would slow down the processing a bit (because 
those lines would need to be processed twice: once to detect that they 
have some bad characters, a second time to replace them).  I'm not sure 
if it would slow down processing of files that include no bad chars.  
I'll take a look.

Duncan Murdoch
#
On 27/02/2010 2:38 AM, Felix Sch?nbrodt wrote:
I've changed R-devel so that it now gives a warning instead of an error 
in such cases.  The warning reports the line numbers of the bad 
characters, and the installer converts them to <xx>-style hex codes.  If 
you've used them in variable names this will likely lead to a syntax 
error; in string literals it will look ugly but should be accepted.  In 
comments it will look ugly, but comments aren't normally saved, so they 
won't really matter there.

Duncan Murdoch