Parallel File System support in R (e.g. GPFS)
Hi Jonathan, We are developing some parallel file system readers and writers for R. They are intended to be used in a Single Program Multiple Data (SPMD) programming mode with Rmpi. Each processor reads its own chunk of data and is intended to hand it off to another SPMD R code to do the analysis. We are close to having a parallel version of ncdf, a NetCDF collective read/write package. George
On 2/17/12 10:20 AM, Jonathan Greenberg wrote:
R-sig-hpc'ers: I've started running R on a large cluster at my university, which uses the IBM GPFS parallel file system. I'm wondering if there is any support within R for parallel writes to a single file or if there are any suggestions on to the implement, say, writing to a large binary file representing an image. The parallelization I'm thinking of is: given an image of x by y columns and rows represented by a flat binary file, process chunks of this image on different cpus/nodes, then write the results to a single file. The alternative is to write each chunk out separately then "mosaic" them back together, but this would involve reading/writing the data twice, and this process is going to be an I/O intensive one. Thoughts? --j
George Ostrouchov, Ph.D. Scientific Data Group Computer Science and Mathematics Division Oak Ridge National Laboratory and Remote Data Analysis and Visualization Center National Institute for Computational Sciences The University of Tennessee http://www.csm.ornl.gov/~ost