text mining

Hi,

I have a problem when indexing the corpus. I used the following syntax:
Setwd ("c :/....")
Library (tm)
Txt = Corpus (DirSource ("."); readerControl = list (language = "frensh"))
an error message comes:
Messages d'avis :
1: In readLines(y, encoding = x$Encoding) :
  ligne finale incompl?te trouv?e dans './n3.txt'
2: In readLines(y, encoding = x$Encoding) :
  ligne finale incompl?te trouv?e dans './n32.

another question:
 how can I read different document types (. pdf,. "...) html using the
package "tm"?

Thanks very well for help

--
View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3560367p3560367.html
Sent from the R help mailing list archive at Nabble.com.
Hi,

I have a problem when indexing the corpus. I used the following syntax:

 Setwd ("c :/....")
 Library (tm)
 Txt = Corpus (DirSource ("."); readerControl = list (language = "frensh"))

Capitalization is important in R, so when asking a question, please cut 
and paste what you actually did.  In this case, it doesn't matter.
an error message comes:

 Messages d'avis :
1: In readLines(y, encoding = x$Encoding) :
   ligne finale incompl?te trouv?e dans './n3.txt'
2: In readLines(y, encoding = x$Encoding) :
   ligne finale incompl?te trouv?e dans './n32.
Those are warnings, not errors.   readLines gives those warnings when 
the last line of the file stops abruptly, rather than having an end of 
line marker.  On Unix systems this usually signals a problem with the 
file.  Windows is more tolerant, so many editors don't bother to add the 
final marker.
another question:
  how can I read different document types (. pdf,. "...) html using the
package "tm"?
I think you need to convert them to text first (by some tool outside of 
R), but I might be wrong.

Duncan Murdoch
Thanks very well for help

--
View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3560367p3560367.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.