Skip to content
Prev 6736 / 15075 Next

StringIndexOutOfBoundsException in RWeka

I have narrowed the problem down to this:

NGramTokenizer("-", control = Weka_control(min = 1, max = 4))

The string actually occurs as fourth segment in the 21,226th sentence.  I find this strange, since I am 
using the default delimiters ' \r\n\t.,;:'"()?!', which do not contain a hyphen.

Regards,
Richard

On Tue, 12 Jan 2010 16:50:16 +0100, Richard R. Liu wrote
--
Richard R. Liu
Dittingerstr. 33
CH-4053 Basel
Switzerland

Tel.:  +41 61 331 10 47
Email:  richard.liu at pueo-owl.ch