Skip to content
Prev 305418 / 398506 Next

cannot read iso639 table

On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out
the initial 3 bytes (the byte-order mark?) to make things work:

  > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
  > readChar(socket, nchars=3, useBytes=TRUE)
  [1] "???"
  > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
  > dim(d)
  [1] 485   5
  > head(d)
     V1 V2 V3             V4      V5
  1 aar    aa           Afar    afar
  2 abk    ab      Abkhazian abkhaze
  3 ace             Achinese    aceh
  4 ach                Acoli   acoli
  5 ada              Adangme adangme
  6 ady       Adyghe; Adygei  adygh?

If I deleted no initial bytes I got
  > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
  > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
  Warning messages:
  1: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
  2: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
  > dim(d)
  [1] 1 1
  > str(d)
  'data.frame':   1 obs. of  1 variable:
   $ V1: chr "?"
If I delete the initial 2 bytes I got an "empty beginning of file" error:
  > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
  > readChar(socket, nchars=2, useBytes=TRUE)
  [1] "??"
  > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
  Error in read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) : 
    empty beginning of file
  In addition: Warning messages:
  1: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
  2: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'

  > sessionInfo()
  R version 2.15.1 (2012-06-22)
  Platform: x86_64-pc-mingw32/x64 (64-bit)
  
  locale:
  [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
  [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
  [5] LC_TIME=English_United States.1252    
  
  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com