-----Original Message-----
From: peter dalgaard [mailto:pdalgd at gmail.com]
Sent: Thursday, September 13, 2012 1:43 PM
To: William Dunlap
Cc: sds at gnu.org; r-help at r-project.org
Subject: Re: [R] cannot read iso639 table
Pragmatically, one can zap the BOM from the output with
language.ISO.table[1,1] <- substring(language.ISO.table[1,1],2)
and be gone with it.
It would be nicer to zap the BOM before read.table, though. It does work for me with the
below (notice that the BOM is a single character if you don't use useBytes=).
function () {
socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",
open="r",encoding="utf-8");
readChar(socket, nchar=1)
data <- read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
col.names = c("a3bibliographic","a3terminologic",
"a2","english","french"), quote="");
close(socket);
data
}
On Sep 13, 2012, at 22:26 , William Dunlap wrote:
It would be helpful if you showed your commands and printed
outputs, copied directly from your R session, from the beginning
to the end. I put the call to sessionInfo() in my message because
it is probably relevant. It is nice to completely include the original
email when responding to it so others can see the whole story in
one place.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: Sam Steingold [mailto:sam.steingold at gmail.com] On Behalf Of Sam Steingold
Sent: Thursday, September 13, 2012 1:18 PM
To: William Dunlap
Cc: peter dalgaard; r-help at r-project.org
Subject: Re: [R] cannot read iso639 table
* William Dunlap <jqhaync at gvopb.pbz> [2012-09-13 19:50:21 +0000]:
On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out
the initial 3 bytes (the byte-order mark?) to make things work:
8.txt",open="r",encoding="utf-8")
readChar(socket, nchars=3, useBytes=TRUE)
confirmed - first 3 bytes are "\357\273\277"
d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
dim(d)
V1 V2 V3 V4 V5
1 aar aa Afar afar
2 abk ab Abkhazian abkhaze
3 ace Achinese aceh
4 ach Acoli acoli
5 ada Adangme adangme
6 ady Adyghe; Adygei adygh?
alas, this is all I get:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
invalid input found on input connection 'http://www.loc.gov/standards/iso639-
639-2_utf-8.txt'
a3bibliographic a3terminologic a2 english french
1 aar NA aa Afar afar
2 abk NA ab Abkhazian abkhaze
3 ace NA Achinese aceh
4 ach NA Acoli acoli
5 ada NA Adangme adangme
6 ady NA Adyghe; Adygei adygh
note that the first non-ASCII character terminates the input.
so, I still cannot read the data from the URL.
I can read the file though - with quote="" (thanks Peter!) -
except that the first record is "\357\273\277aar".
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com
http://mideasttruth.com http://iris.org.il http://jihadwatch.org
The only thing worse than X Windows: (X Windows) - X
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com