Skip to content

help loading files into R for koRpus analysis

4 messages · Gordon Ballingrud, Duncan Murdoch

#
Hello all,



I need some help with loading text-file data into R for analysis with
packages like koRpus.



The problem I am facing is getting R to recognize a folder full of Word
files (about 4,000) as data which I can then make koRpus perform analyses
like Coleman-Liau indexing. If at all possible, I prefer to make this work
with Word files. The key problem is the struggle to cause R to recognize
the text (Word) files in bulk (that is, all at the same time) so that
koRpus can do its thing with those files.



My attempts to make this work have all been in vain, but I know that
packages like koRpus would be limited in usefulness if there were no way to
get the package to do its work on a large collection of files all at once.



I hope this problem will make sense to someone, and that there is a tenable
solution to it.



Thanks,

Gordon
#
You may get a helpful response, but if not, I'd suggest posting code you 
have to read one file.  Then lots of people could likely show you how to 
modify it to read all 4000 files.

Duncan Murdoch
On 02/11/2020 12:28 p.m., Gordon Ballingrud wrote:
#
Thanks; that's a good point. Here is what I have been working with:

library(quanteda)
library(readtext)

texts <- readtext(paste0("/Users/Gordon/Desktop/WPSCASES/", "/word/*.docx"))

And the error message:
Error in list_files(file, ignore_missing, TRUE, verbosity) :
  File '' does not exist.


On Mon, Nov 2, 2020 at 3:15 PM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

  
  
#
On 02/11/2020 4:46 p.m., Gordon Ballingrud wrote:
On Windows, you can't have an empty entry in a pathname, so you should 
leave off one of the slashes:

   texts <- readtext(paste0("/Users/Gordon/Desktop/WPSCASES/", 
"word/*.docx"))

You could skip the paste0 entirely, and use

   texts <- readtext("/Users/Gordon/Desktop/WPSCASES/word/*.docx")

but I'm assuming this is just an example of a more complex situation.

Duncan Murdoch