Hi, ia have similar problem you had.Did you manage to find out what that error meant? thanks, m -- View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3540468.html Sent from the R help mailing list archive at Nabble.com.
Help using "tm" text mining package - preprocessing
3 messages · Matevz Pavlic, Spencer Graves
Got it...the problem was with Slovenian characters. Once i replaced them with normal characters it works fine. Tnx anyway, m -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of mpavlic Sent: Saturday, May 21, 2011 1:06 PM To: r-help at r-project.org Subject: Re: [R] Help using "tm" text mining package - preprocessing Hi, ia have similar problem you had.Did you manage to find out what that error meant? thanks, m -- View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3540468.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Is there a way to use "tm" with the Slovene characters? I ask,
because I was hoping to use "tm" with languages like Arabic, Urdu,
Farsi, and Hebrew. If you need to translate Slovene characters, it
could create problems with using the software for the desired purpose in
many languages, including Russian, which is on the list of languages
currently supported.
The package includes two vignettes, the first of which cites two
2008 papers by Feinerer et al. in the Journal of Statistical Software
and R News. Both those papers are freely downloadable. Have you looked
at those?
I have not studied the "tm" documentation carefully, but the
package includes a function "stopwords", which returns the "language
tags" for an indicated language per the Internet Engineering Task Force
(IETF; www.ietf.org). Slovene is not among the languages currently
supported, but "their IETF language tags may be used." I have not used
the package, but you can supply your own list of stopwords for Slovene,
similar to the following silly example:
> stopwords<-function(language='duh')
if(language=='duh')return(c('duh', 'hud')) else tm:::stopwords(language)
> stopwords('duh')
[1] "duh" "hud"
This may not be all you need to do to use "tm" with Slovene, but
it might help you with "stopwords".
Hope this helps.
Spencer Graves
On 5/21/2011 5:59 AM, Matev? Pavli? wrote:
Got it...the problem was with Slovenian characters. Once i replaced them with normal characters it works fine. Tnx anyway, m -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of mpavlic Sent: Saturday, May 21, 2011 1:06 PM To: r-help at r-project.org Subject: Re: [R] Help using "tm" text mining package - preprocessing Hi, ia have similar problem you had.Did you manage to find out what that error meant? thanks, m -- View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3540468.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567