Skip to content

Globbing Files in R

6 messages · Gundala Viswanath, Douglas Bates, Gabor Grothendieck +3 more

#
Dear all,

For example I want to process set of files.

Typically  Perl's idiom would be:

__BEGIN__
@files = glob("/mydir/*.txt");

foreach my $file (@files) {
  # process the file
}
__END__

What's the R's way to do that?

- Gundala Viswanath
Jakarta - Indonesia
#
On Sun, Dec 21, 2008 at 9:35 AM, Gundala Viswanath <gundalav at gmail.com> wrote:
The tools to do this are the functions list.files, grep (or variants
on grep) and perhaps glob2rx.  See the help files for each.

One approach is
[1] "notes312.txt" "stat324.txt"

Note that grep returns a vector of indices into the character vector,
not the character vectors themselves.
#
Try this:

file.names <- dir(pattern = glob2rx("/mydir/*.txt"))
for(fn in file.names) {
  DF <- read.table(fn, ...)
  ...
}

Another possibility is:

file.names <- .. as above ...
out <- lapply(file.names, function(fn) {
   DF <- read.table(fn, ...)
   ...
})

out will have one component per file formed from the result of the
each function application.
On Sun, Dec 21, 2008 at 10:35 AM, Gundala Viswanath <gundalav at gmail.com> wrote:
#
Sys.glob is much more direct ....

Education of you might find exploring the power of?? (e.g. ??glob) 
educational.
On Sun, 21 Dec 2008, Douglas Bates wrote:

            
One of them is exactly the same idiom.

  
    
1 day later
#
Gundala Viswanath wrote:

            
Something like this has been suggested in R-help before:

files <- dir()
results <- lapply(files, yourprocessing())

The dir function has path and pattern arguments to select the set of 
files you want.

This works fine when there are no problems, but often I'll use a for 
loop so problem files can be dealt with differently when necessary.

Perhaps something like this:

ProcessList <- dir(pattern="InPerson*")

for (i in 1:length(ProcessList))
{
   filename <- ProcessList[i]
   . . .
}


efg

Earl F Glynn
Overland Park, KS
#
On 12/22/2008 1:14 PM, Earl F Glynn wrote:
Remember to use seq_along() instead:  ProcessList might be length 0.

Duncan Murdoch