Skip to content
Prev 420 / 21307 Next

[Bioc-devel] Invalid multibyte string

Hi, I'm forwarding a msg from Florian.

Florian: I've added this address also to the list so that you should
be able to post with it as well.  

+ seth


From: Florian Hahne <f.hahne at dkfz-heidelberg.de>
Subject: Re: [Bioc-devel] Invalid multibyte string
To: Sean Davis <sdavis2 at mail.nih.gov>
CC: bioc-devel at stat.math.ethz.ch
Date: Sat Mar  4 02:06:06 2006 -0800

Hi Sean,
I had a similar problem with invalid multibyte strings in the UTF-8 
locale. This error occurs when you apply any of the string processing
functions to a string that contains  non-UTF-8 characters. In my code
I use the function iconv to convert the string to latin encoding
before applying strsplit (take a look at the code below). This
substitutes the illegal characters with the hex code of the respective
byte. Not sure if this is helpful in your situation, but at least it
doesn't force the user into using a specific locale.

readFCStext <- function(con, offsets) {
  seek(con, offsets["textstart"])
  txt <- readChar(con, offsets["textend"]-offsets["textstart"]+1)
  txt <- iconv(txt, "", "latin1", sub="byte")
  delimiter <- substr(txt, 1, 1)
  sp  <- strsplit(substr(txt, 2, nchar(txt)), split=delimiter,
  fixed=TRUE)[[1]]
  rv <- sp[seq(2, length(sp), by=2)]
  names(rv) <- sp[seq(1, length(sp)-1, by=2)]
  return(rv)
}

Cheers,
Florian