Skip to content
Prev 262104 / 398502 Next

Creating a file with reusable functions accessible throughout a computational biology cancer project

On 07/06/2011 12:41 PM, Ben Ganzfried wrote:
There is:  you should put your common functions in a package.  Packages 
are a good way to organize your own code, you don't need to publish 
them.  (You will get a warning if you put "Not for distribution" into 
the License field in the DESCRIPTION file, but it's just a warning.)  
You can also put datasets in a package; this makes sense if they are 
relatively static.  If you get new data every day you probably wouldn't.
Both of those sound very easy.   For example,

curate <- function(characteristic, word="grade: ") {
   tmp <- sub(word, "", characteristic, fixed=TRUE)
   tmp[tmp=="I"] <- "low"
   tmp[tmp=="II"] <- "low"
   tmp[tmp=="III"] <- "high"
   tmp
}

Then your script would just need one line

curated$G <- curate(uncurated$characteristics_ch1.2)

I don't know where you'll find the names of all the datasets, but if you 
can get them into a vector, it's pretty easy to write a loop that calls 
curate() for each one.

Deciding how much goes in the package and how much is one-off code that 
stays with a particular dataset is a judgment call.  I'd guess based on 
your description that curate() belongs in the package but the rest 
doesn't, but you know a lot more about the details than I do.

Duncan Murdoch