Skip to content
Prev 6737 / 15075 Next

StringIndexOutOfBoundsException in RWeka

I have narrowed the problem down to this:
Error in .jcall("weka/core/tokenizers/Tokenizer", "[S", "tokenize", .jcast(tokenizer,  : 
  java.lang.StringIndexOutOfBoundsException: String index out of range: 1

Indeed, the 21226th sentence contains a segment composed of a single hyphen.  I am using the 
default delimiters of the WEKA control.  The hyphen is thus not a delimiter.  A segment consisting of 
two consecutive hyphens ("--") does not cause the exception.

Regards,
Richard

On Tue, 12 Jan 2010 16:50:16 +0100, Richard R. Liu wrote
--
Richard R. Liu
Dittingerstr. 33
CH-4053 Basel
Switzerland

Tel.:  +41 61 331 10 47
Email:  richard.liu at pueo-owl.ch