Unicode Text Segmentation Algorithms already implemented in R?
You searched, but did not tell us what you found, nor why it was unsuitable for you undescribed use case. So all we can do is guess: my guess is http://docs.rexamine.com/R-man/stringi/stringi-search-boundaries.html Best, Ista
On Mar 3, 2016 8:14 AM, "Sascha Wolfer" <wolfer at ids-mannheim.de> wrote:
Hello list members, I am looking for an implementation of Unicode text segmentation (word boundary detection) algorithms in R. You can find information about the algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries The help page for the function ?casefuns? from the excellent ?Unicode? package says: "Other methods will be added eventually (once the Unicode text segmentation algorithm is implemented for detecting word boundaries).? My simple question is: Are these algorithms already implemented in an R package? I didn?t find anything on the web, but I am counting on the power of this list. My Stata-using colleague is already picking at me? (in Stata, the function ?ustrword? does exactly what I want to do in R). Thanks for your help, have a good day, you all! Sascha W.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.