Skip to content

input string ... cannot be translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?

3 messages · Spencer Graves, Duncan Murdoch, John Kane

#
Hello:


	  What if anything should I do regarding notes from either "load" or 
"attach" that, "input string ... cannot be translated to UTF-8, is it 
valid in 'ANSI_X3.4-1968'?"?


	  I'm running R 4.0.5 under macOS 11.2.3;  see "sessionInfo()" and 
detailed instructions below on the precise file I dowloaded from the web 
and tried to read.


	  I may be able to get what I want just ignoring this.  However, I'd 
like to know how to fix this.


	  Thanks,
	  Spencer Graves


sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: 
/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
  [1] compiler_4.0.5    htmltools_0.5.1.1 tools_4.0.5       yaml_2.2.1 

  [5] tinytex_0.31      rmarkdown_2.7     knitr_1.31 
digest_0.6.27
  [9] xfun_0.22         rlang_0.4.10      evaluate_0.14
 > search()
  [1] ".GlobalEnv"                "file:NAVCO 1.3 List.RData"
  [3] "file:NAVCO 1.3 List.RData" "tools:rstudio"
  [5] "package:stats"             "package:graphics"
  [7] "package:grDevices"         "package:utils"
  [9] "package:datasets"          "package:methods"
[11] "Autoloads"                 "package:base"


*** To get the file I used for this, I went to 
"https://www.ericachenoweth.com/research".  From there I clicked 
"Version 1.3".  This took me to


https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ON9XND


I then clicked the "Download" icon to the right of "NAVCO 1.3 List.tab". 
  This gave me 5 "Download Options", one of which was "RData Format";  I 
selected that.  This downloaded "NAVCO 1.3 List.RData", which I moved to 
getwd().  Then I did 'load("NAVCO 1.3 List.RData")' and 'attach("NAVCO 
1.3 List.RData")'.  Both of those gave me 8 repetitions of a message 
like "input string ... cannot be translated to UTF-8, is it valid in 
'ANSI_X3.4-1968'?" with different values substituted for "...".
#
On 22/04/2021 9:25 p.m., Spencer Graves wrote:
First, ANSI_X3.4-1968  is an official name for for a version of Ascii. 
It appears in the file near the start, where I believe it records the 
native encoding in place when the file was written, so readers using a 
different encoding can translate.

Your actual file appears to have been encoded in UTF-8, but not marked 
as such.  You're lucky you read it on macOS, where UTF-8 is the native 
encoding, since the reader probably recognized the bytes weren't ascii 
bytes (and warned you about that), then just left them alone.  If you 
read that file on Windows you'd likely get junk for those entries.

For your interest, here's a dump of the start of your file, after 
gunzipping it:

00000000  52 44 58 33 0a 58 0a 00  00 00 03 00 03 06 00 00 
|RDX3.X..........|
00000010  03 05 00 00 00 00 0e 41  4e 53 49 5f 58 33 2e 34 
|.......ANSI_X3.4|
00000020  2d 31 39 36 38 00 00 04  02 00 00 00 01 00 04 00 
|-1968...........|
00000030  09 00 00 00 01 78 00 00  03 13 00 00 00 10 00 00 
|.....x..........|
00000040  02 0e 00 00 02 6e 40 90  0c 00 00 00 00 00 40 90 
|.....n at .......@.|
00000050  44 00 00 00 00 00 40 10  00 00 00 00 00 00 40 7c 
|D..... at .......@||

Duncan Murdoch
2 days later
#
The tab format seems to read in with no problem.
On Thu, 22 Apr 2021 at 23:08, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: