For several years I've been running scripts with this expression
substr(tmp.un[substr(tmp.un,1,1)=='u'],1,1) <- '\265'
whose purpose is to replace the letter 'u' with a
Greek mu, to change, for example, 'ug/L' to
'?g/L'. It has been working over several versions
of R.
Recently, I started (sometimes?) getting error messages from the expression:
Error in "substr<-"(`*tmp*`, 1, 1, value = "<b5>") :
invalid multibyte string
In the same R session:
Sys.getlocale()
[1] "C/en_US.UTF-8/C/C/C/C"
Yet, in another R session on the same machine (in a different directory)
Sys.getlocale()
[1] "C"
If I go back to the first session, that had the problem, and do
> Sys.setlocale('LC_ALL','C')
then the error goes away. This is good.
But I have no idea why the locale is apparently
getting set wrong; I don't believe I ever
explicitly wrote a "set locale" expression of any
sort anywhere in the scripts for this project.
Maybe it's a side effect of something, but I
don't know what.
Any suggestions would be most appreciated.
Thanks
-Don
sessionInfo()
R version 2.2.1, 2005-12-26, powerpc-apple-darwin7.9.0
attached base packages:
[1] "stats" "utils" "methods" "base"
other attached packages:
xtable rmacq ROracle DBI
"1.3-0" "1.0" "0.5-5" "0.1-9"
(and I will move to 2.3.0 soon)
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
For several years I've been running scripts with this expression
substr(tmp.un[substr(tmp.un,1,1)=='u'],1,1) <- '\265'
whose purpose is to replace the letter 'u' with a
Greek mu, to change, for example, 'ug/L' to
'?g/L'. It has been working over several versions
of R.
Recently, I started (sometimes?) getting error messages from the
expression:
Error in "substr<-"(`*tmp*`, 1, 1, value = "<b5>") :
invalid multibyte string
With high probability this is due to UTF-8 locale being used. You can
check that easily by looking for
"Natural language support but running in an English locale"
in the greeting.
Starting with R 2.3.0 (due to updated gettext) the locale is
determined from the system settings. Previously, the system setting
was always ignored. Right now it is used if no other locale setting
is set, such as LANG, LC_ALL etc. If you want to force C locale, you
should run your scripts using something like
LANG=C R
and this is what most scripts do if they want to force no-locale
environment. Note, however, that UTF-8 is required by the system
utilities, so forcing a non-UTF-8 locale will prevent you from using
non-ASCII characters in file names or etc. On Mac OS X is it usually
safer to write everything in UTF-8 including your scripts.
Cheers,
Simon