Skip to content
Prev 58749 / 63424 Next

Error in substring: invalid multibyte string

Thanks for the quick response Ivan. readLines with encoding='latin1' works
for me (on Ubuntu).

However I was more concerned with the inconsistency in results between
substr and regexpr. I was expecting that if one of them errors because of
an unknown encoding then the other should as well. Even better, if regexpr
works, why shouldn't substr work as well?

Incidentally the analogous stringi function stri_sub works fine in this
case:
[1] "<I>Jens Oehlschl\xe4gel-Akiyoshi"

But the stringi analog to nchar gives a similar warning:
[1] NA
Warning message:
In stringi::stri_length("<I>Jens Oehlschl\xe4gel-Akiyoshi") :
  invalid UTF-8 byte sequence detected; try calling stri_enc_toutf8()
On Sat, Jun 27, 2020 at 2:12 AM Ivan Krylov <krylov.r00t at gmail.com> wrote: