Skip to content

Read files in a folder when new data files come

5 messages · jim holtman, Carlos J. Gil Bellosta, Barry Rowlingson +1 more

#
Hello,

I am working on a project. The new data files is coming as the data
collectors get data, then
the data collectors put these new data files in a folder. I need to
read these new data files when they are in folder.
so far, I did this job manually, that is to say, each time I go to
that folder and find new data files, then use my R program to
read these new data files. I am wondering if anyone know how to
perform this job automatically in R.

thanks,

jlm
#
You can read the status of every file in a directory and make the
decision to process it.  One technique is to create a file in the
directory the last time that you processed information from the
directory.  You could schedule an R script to first read in your
'flag' file and determine the date it was created and then get all the
files in the directory that are later than that date to process them.
You would then rewrite your flag file to update its modification date
for the next round.

Does this do what you want?
On Sun, Jan 24, 2010 at 3:05 PM, jlfmssm <jlfmssm at gmail.com> wrote:

  
    
#
Hello,

Could you tell us something more about your infrastructure? Windows? Linux?

On Unix/Linux you could use cron to have a R process to read all the 
files in the given directory, process them one by one and archive them 
in another place.

On Windows, no idea.

Alternatively, you could perhaps ask your users to use some kind of web 
interface to upload the data. This interface could then trigger an R 
process.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com
jlfmssm wrote:
#
On Sun, Jan 24, 2010 at 8:05 PM, jlfmssm <jlfmssm at gmail.com> wrote:
Without needing some operating-system specific hackery, the easiest
way would be to use 'list.files()' and look for new files every so
many minutes or seconds (depending on how urgent it is). Or to check
file.info() on your directory and test the modification time. You'd
then write that into a  .R file and run that in the background using
your operating system's background job functionality (as a 'service'
in Windows, or as a background process in Unix). Use
Sys.sleep(seconds) to wait in your loop. Something like (totally
untested):

lastChange = file.info(dumpLocation)$mtime
while(TRUE){
  currentM = file.info(dumpLocation)$mtime
  if(currentM != lastChange){
    lastChange = currentM
    doSomethingWithStuffIn(dumpLocation)
  }
# try again in 10 minutes
Sys.sleep(600)
}

 There are ways for programs to get directory content change events
when files appear in directories, but they will probably be very
operating system specific. There's also the problem of your code
firing up when a file is only half-uploaded - what do you do then?
Does your data format have an 'end of data' marker?

 Barry
#
Thank you for your reply,  Yes, this is what I want to do. I am
working on Windows.

The data files is located in a folder on a data server. Each time the
data collectors put the new data on the
data server once they get new data. What I want to do is my R program
will process those new data files once my program finds there are new
data coming into that folder in this data server.

Thanks,

jlm
On Sun, Jan 24, 2010 at 2:12 PM, jim holtman <jholtman at gmail.com> wrote: