Skip to content
Prev 727 / 2152 Next

Making a series of similar, but modified .r files- suggested method(s)? Re: Running jobs on a linux cluster

Laura

It seems you are using the Sun Grid Engine.  You want to look into the
concept of an "Array Job".  Essentially an array job allows you to run
a script many time, the only thing being different is the value of an
environment variable.  This sounds simple, but is really pretty power
full

Say a normal script would loop over a hundred input files and do something like

for( file in list.files() ) {
  data = read.table(file)
  fit = glm(y~x, data = data)
  save(fit, file = paste(file, "-fit.rda"))
}

With an array job you would do something like

slotNumber = Sys.getenv("SGE_TASK_ID")
file = list.files() [ slotNumber ]
data = read.table(file)
fit = glm(y~x, data = data)
save(fit, file = paste(file, "-fit.rda"))

Here you see how I use the slotNumber variable (which will be an
integer) to index into the vector returned by list.files().

You submit you job like
qsub -t 1-100 SCRIPT

Here I will spawn a 100 submissions, with values from 1 to 100 of SGE_TASK_ID.

Finally, conceptually to SGE it will look like one big job, so you
just need to do a single qdel if you need to remove it.

In summary I tend to always set up one big vector or list and then
just index into that list.  But there are many variants of the stuff
above, and I am sure you can figure what to do.

Kasper
On Sat, Aug 21, 2010 at 2:03 PM, Laura S <leslaura at gmail.com> wrote: