Skip to content
Prev 155300 / 398506 Next

Request for advice on character set conversions (those damn Excel files, again ...)

Emmanuel Charpentier wrote:
This looks reasonably sane, I think. The last loop could be d[] <- 
lapply(d, conv1, from, to), but I think that is cosmetic. You can't 
really do much better because there is no simple way of distinguishing 
between the various 8-bit character sets. You could presumably setup 
some heuristics. like the fact that the occurrence of 0x82 or 0x8a 
probably indicates cp437, but it gets tricky. (At least, in French, you 
don't have the Danish/Norwegian peculiarity that upper/lowercase o-slash 
were missing in cp437, and therefore often replaced yen and cent symbols 
in matrix printer ROMs. We still get the occational parcel addressed to 
"?ster Farimagsgade".)