Hello all, I need some help with loading text-file data into R for analysis with packages like koRpus. The problem I am facing is getting R to recognize a folder full of Word files (about 4,000) as data which I can then make koRpus perform analyses like Coleman-Liau indexing. If at all possible, I prefer to make this work with Word files. The key problem is the struggle to cause R to recognize the text (Word) files in bulk (that is, all at the same time) so that koRpus can do its thing with those files. My attempts to make this work have all been in vain, but I know that packages like koRpus would be limited in usefulness if there were no way to get the package to do its work on a large collection of files all at once. I hope this problem will make sense to someone, and that there is a tenable solution to it. Thanks, Gordon
help loading files into R for koRpus analysis
4 messages · Gordon Ballingrud, Duncan Murdoch
You may get a helpful response, but if not, I'd suggest posting code you have to read one file. Then lots of people could likely show you how to modify it to read all 4000 files. Duncan Murdoch
On 02/11/2020 12:28 p.m., Gordon Ballingrud wrote:
Hello all, I need some help with loading text-file data into R for analysis with packages like koRpus. The problem I am facing is getting R to recognize a folder full of Word files (about 4,000) as data which I can then make koRpus perform analyses like Coleman-Liau indexing. If at all possible, I prefer to make this work with Word files. The key problem is the struggle to cause R to recognize the text (Word) files in bulk (that is, all at the same time) so that koRpus can do its thing with those files. My attempts to make this work have all been in vain, but I know that packages like koRpus would be limited in usefulness if there were no way to get the package to do its work on a large collection of files all at once. I hope this problem will make sense to someone, and that there is a tenable solution to it. Thanks, Gordon [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks; that's a good point. Here is what I have been working with:
library(quanteda)
library(readtext)
texts <- readtext(paste0("/Users/Gordon/Desktop/WPSCASES/", "/word/*.docx"))
And the error message:
Error in list_files(file, ignore_missing, TRUE, verbosity) :
File '' does not exist.
On Mon, Nov 2, 2020 at 3:15 PM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
You may get a helpful response, but if not, I'd suggest posting code you have to read one file. Then lots of people could likely show you how to modify it to read all 4000 files. Duncan Murdoch On 02/11/2020 12:28 p.m., Gordon Ballingrud wrote:
Hello all, I need some help with loading text-file data into R for analysis with packages like koRpus. The problem I am facing is getting R to recognize a folder full of Word files (about 4,000) as data which I can then make koRpus perform analyses like Coleman-Liau indexing. If at all possible, I prefer to make this
work
with Word files. The key problem is the struggle to cause R to recognize the text (Word) files in bulk (that is, all at the same time) so that koRpus can do its thing with those files. My attempts to make this work have all been in vain, but I know that packages like koRpus would be limited in usefulness if there were no way
to
get the package to do its work on a large collection of files all at
once.
I hope this problem will make sense to someone, and that there is a
tenable
solution to it.
Thanks,
Gordon
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 02/11/2020 4:46 p.m., Gordon Ballingrud wrote:
Thanks; that's a good point. Here is what I have been working with:
library(quanteda)
library(readtext)
texts <- readtext(paste0("/Users/Gordon/Desktop/WPSCASES/", "/word/*.docx"))
On Windows, you can't have an empty entry in a pathname, so you should
leave off one of the slashes:
texts <- readtext(paste0("/Users/Gordon/Desktop/WPSCASES/",
"word/*.docx"))
You could skip the paste0 entirely, and use
texts <- readtext("/Users/Gordon/Desktop/WPSCASES/word/*.docx")
but I'm assuming this is just an example of a more complex situation.
Duncan Murdoch
And the error message:
Error in list_files(file, ignore_missing, TRUE, verbosity) :
? File '' does not exist.
On Mon, Nov 2, 2020 at 3:15 PM Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
You may get a helpful response, but if not, I'd suggest posting code
you
have to read one file.? Then lots of people could likely show you
how to
modify it to read all 4000 files.
Duncan Murdoch
On 02/11/2020 12:28 p.m., Gordon Ballingrud wrote:
> Hello all,
>
>
>
> I need some help with loading text-file data into R for analysis with
> packages like koRpus.
>
>
>
> The problem I am facing is getting R to recognize a folder full
of Word
> files (about 4,000) as data which I can then make koRpus perform
analyses
> like Coleman-Liau indexing. If at all possible, I prefer to make
this work
> with Word files. The key problem is the struggle to cause R to
recognize
> the text (Word) files in bulk (that is, all at the same time) so that
> koRpus can do its thing with those files.
>
>
>
> My attempts to make this work have all been in vain, but I know that
> packages like koRpus would be limited in usefulness if there were
no way to
> get the package to do its work on a large collection of files all
at once.
>
>
>
> I hope this problem will make sense to someone, and that there is
a tenable
> solution to it.
>
>
>
> Thanks,
>
> Gordon
>
>? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >