On 3/4/06 5:06 AM, "Florian Hahne" <f.hahne at dkfz-heidelberg.de> wrote:
Hi Sean,
I had a similar problem with invalid multibyte strings in the UTF-8
locale. This error occurs when you apply any of the string processing
functions to a string that contains non-UTF-8 characters. In my code I
use the function iconv to convert the string to latin encoding before
applying strsplit (take a look at the code below). This substitutes the
illegal characters with the hex code of the respective byte. Not sure if
this is helpful in your situation, but at least it doesn't force the
user into using a specific locale.
readFCStext <- function(con, offsets) {
seek(con, offsets["textstart"])
txt <- readChar(con, offsets["textend"]-offsets["textstart"]+1)
txt <- iconv(txt, "", "latin1", sub="byte")
delimiter <- substr(txt, 1, 1)
sp <- strsplit(substr(txt, 2, nchar(txt)), split=delimiter,
fixed=TRUE)[[1]]
rv <- sp[seq(2, length(sp), by=2)]
names(rv) <- sp[seq(1, length(sp)-1, by=2)]
return(rv)
}
Thanks, Florian. I'll give this a try. Sean