Hello everybody,
I work, I try, with TM but I have a problem with some special words in
french. I think this is due to the manner to transform PDF to text, but I'm
not perfectly sure.
Let's see to the example :
findFreqTerms(tdm1,30)
[33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement"
"<U+FB01>nancier" "<U+FB01>nanci?re" "<U+FB01>nanci?res"
"<U+FB01>nanciers" "<U+FB01>xe"
Some french words are not well reading by TM with the reader readPlain. I
try to use reader= reader PDF. But it doesn't work so I must transformed PDF
text to text. And some words are not understand so when I use
TermDocumentMatrix a word like inflation diseappear. It's a big probleme for
me. I spend lot of time on this problem, any idea ? Thank's for you time.
Best regard"s
Micka?l
--
View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433394.html
Sent from the R help mailing list archive at Nabble.com.
TM reader with text
4 messages · Mickael R problem, David Winsemius, Richard M. Heiberger
On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote:
Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example : findFreqTerms(tdm1,30) [33] "<U+F0A3>" "<U+FB01>n" "<U +FB01>nancement" "<U+FB01>nancier" "<U+FB01>nanci?re" "<U+FB01>nanci?res" "<U+FB01>nanciers" "<U+FB01>xe" Some french words are not well reading by TM with the reader readPlain. I try to use reader= reader PDF. But it doesn't work so I must transformed PDF text to text. And some words are not understand so when I use TermDocumentMatrix a word like inflation diseappear. It's a big probleme for me. I spend lot of time on this problem, any idea ? Thank's for you time.
You included no information about your platform, locale settings, or encoding of the text. ?Encoding ?sessionInfo
David Winsemius, MD West Hartford, CT
my computer run under windows vista 64 sp2. The question about encoding, I don't understand it, sorry ? -- View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433526.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120229/323abf29/attachment.pl>