Skip to content

how to organize a lot of R source files

5 messages · Jim Lemon, Henrik Bengtsson, Hao Cen

#
Hi,

I wonder what is a better way to organize a lot of R source files. I have
a lot of utility functions written and store them in several source files
(e.g util1.R, util2.R,..utilN.R). I also have a master file in which the
source command is used to load all the util.R files. When I need to use
the utility functions in a new project, I create a new R file (e.g main.R)
in which I "source" the master file.

The problem with this approach is that anytime a single utility function
is modified, I need to rerun the source command in main.R to load the
master file, which loads all the utility R files via a loop over each
file. Sometimes I have to wait for 10 seconds to get them all loaded.
Sometimes I forget to run the source command. Is there a way in R to 1)
only reload the file changed (like a make utility) when I run source on
all utility files and/or even better 2)  reload the changed utility files,
when I run a command that use one of those utility functions, without the
need for me to source those files.

Not sure if packaging solves this issue because the library command has be
used every time a utility function is modified and in addition the package
has to be rebuilt. I don't worry about sharing the source files at this
moment as I am the only user of those utility files.

This may be a common issue many R users face. I wonder how other R users
solve this issue.

thanks

Jeff
#
Hi Jeff,
Your request makes a lot of sense. I often modify files in the packages 
I maintain, typically by loading the package, then working on a copy of 
the function, continually "sourcing" the new code until it works 
correctly, and then checking and building the package. Apart from the 
official packages I maintain, I keep a few local packages with odd 
functions that I don't think are worth uploading to an already loaded 
CRAN. This shell script can be used to automate the building of a package.

#!/bin/sh
cp $1 $2/R
if R CMD check $2; then
  R CMD build $2;
  R CMD INSTALL $3;
else
  echo "Problem with R check of $2"
fi

If I had modified the "clinsig.R" file in the clinsig package, I could 
call this script like this:

Rpackage /home/jim/R/clinsig.R /home/jim/R/clinsig clinsig_1.0-1.tar.gz

and it would rebuild the package with the new function. Because I 
usually keep the files I am modifying in /home/jim/R I could simplify 
the command line a bit. This may seem like a lot of work, but when I 
worked out a way to get a function to check the timestamp of its source 
file and compare it against the timestamp of the latest package:

if(max(file.info(system("find /home/jim/R -name 'clinsig.R'
  -type f",intern=TRUE))$mtime) >
  max(file.info(system("find /home/jim/R -name 'clinsig_*'
  -type f",intern=TRUE))$mtime))
  source("/home/jim/R/clinsig.R")

a lot of hard coding of file locations ends up in your function file.

Jim
#
library("R.utils");
sourceDirectory("myRFiles/", modifiedOnly=TRUE);

See ?sourceDirectory (regardless what the Rd help say, any '...'
argument is passed to sourceTo()).

/Henrik
On Fri, Jan 8, 2010 at 7:38 AM, Hao Cen <hcen at andrew.cmu.edu> wrote:
1 day later
#
Hi Henrik,

Thanks for your suggestion. I created a directory with 10 R files and
tried the following and measured its time

system.time(sourceDirectory("~/fun", modifiedOnly = F))
system.time(sourceDirectory("~/fun", modifiedOnly = T))

But the second line seems to spend as much time as the first line, I
thought   the second line would be faster since no modification is made.

Also the first line reports a warning as follows
In readLines(con = fh) :
  incomplete final line found on "~/fun/util1.R"
I don't see such a warning when I use source.

Maybe the two issues are related. Please advise.

thanks

Jeff
On Fri, January 8, 2010 7:56 pm, Henrik Bengtsson wrote:
#
Hi.
On Sat, Jan 9, 2010 at 6:16 PM, Hao Cen <hcen at andrew.cmu.edu> wrote:
Use modifiedOnly=TRUE the first time too, and you'll see it'll work
the 2nd time.  When you call it the first time, R consider it as
"modified" (since last time), because it has never seen the code
before (in that R session).  If you add some verbose output in your
scripts, you'll definitely see when the scripts get sourced.

I guess you could say it should the way you did it, but the way it is
currently designed/implemented is that it does not record the last
"source" time unless you use modifiedOnly=TRUE.  Next release will
also support what you did.
Unrelated.  Nothing to worry about.  I've added readLines(con=fh,
warn=FALSE) for the next release to get rid of such warnings.

/Henrik