Skip to content

Getting your stuff organized in R

3 messages · Agustin Lobo, Christian Hennig, Ko-Kang Kevin Wang

#
I'm attaching an small text file
on "Getting your stuff organized in R".
(Sorry if sending an attachment is not considered
a correct etiquette in r-help, but this is
only 7911 bytes, plain ascii text and I cannot
post it in a web page at the moment).

Probably all the information in this document is scattered
in one or more
R introduction guides, but I think that it is useful to have
it concentrated under this title. The number of
R objects that are created by the user grows fast
and the way R stores them is kind
of particular (most other packages create a unique disk file
for each object). Therefore, it is important for anyone starting
with R to learn how to organize his/her R objects and avoid
messing up everything into one single, often large .RData file.

I send this document to the list with the hope that people
will correct errors and suggest alternative, better methods.
Please do so directely to alobo at ija.csic.es, not the list.
After your feedback, I'll format it as pdf
or html and send it to the  Contributed
Documentation section of the R-CRAN pages.

Thanks

Agus

Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es


-------------- next part --------------
Getting your stuff organized in R

Probably all this information is scattered in one or more
R introduction guides, but I think that it is useful to have
it concentrated under this title. If after a first contact
with R you have decided to use it, you will want to start working with
your own data as soon as possible. R does not
create a unique disk file for each object, which is the most
comon situation for other packages and probably you are a bit
confused with this. Also, the number of data and function objects
can grow really fast in your R sessions. Therefore, as the number of your
R objects grows and the way R stores them is kind
of idiosyncratic, it is important for you to learn how
to organize your R objects just prior to start working with
your own data.



1. As you know from the R-start.pdf, R keeps everything
in memory. Therefore, it is sage to often type
which will save to disk everything that is listed after
into a file named .RData, which is located in the same directory
from whence you started R. Remember that you must use ls -a in order to list
this file along with any other file starting by "." in unix systems.

Therefore, the first "organizing" rule is simply keep your projects into
separate directories and launch R from the appropriate directory.


2. You can save to a different file and/or another directory with:
3. It is useful to take advantage of the capabilities of ls() to select
what you want, i.e.:
[1] "lissNPC100"      "lissNPC100.ady"  "lissNPC100.stat" "lissNPC1100.ref"
4. You normaly will need functions that are not
in the base package and that are not made available to you after
a default R start. You normally don't want these functions
in your workspace, as they would get saved with save.image()
into .RData and mixed with your objects (which probably also
include "inmature" functions). If you require functions
from a CRAN package, you just use:
If you type ls() afterwards, you wont see the Rstreams functions. For the shake
of organization, R does not load the package into your workspace, although
the functions are available for you to be used. If you type
you will get something like:
[1] ".GlobalEnv"       "package:Rstreams" "package:ctest"    "Autoloads"
[5] "package:base"

which lists your workspace (named ".GlobalEnv"), the package you just attached
(which goes, by default, to position 2), and   "package:ctest", "Autoloads"  and
"package:base", which were automatically attached at starting R.

Now, if you type
you will get the listing of the Rstreams package.

5. As you develop your project, you transform your original data and often
create new data frames and data matrices. In order to keep the original data
safe, it's a good idea to keep them in a separate file. Another reason
to separate the original data is that they might be large data files, while you
most often work with data that have been selected or sampled from the original
file. As R automatically will load your .RData in memory, it's more efficient
 not to load any large object unless you really need it.
You can save the original file to a different file with:
and then you can delete the object from your workspace: the next .Rdata file
that you'll make by using
save.image() or by quiting R and saving the workspace, will not include data1.ori.

6. If it happens that you need data1.ori afterwards, you should use
rather than
Using attach("data1ori.rda"), your object data1.ori will be loaded into
a different environment (pos=2 by default), which implies that you'll be able
to use it but will not be mixed up with your "every day work" when you use
save.image and/or quit R.

You can type
before and after attach("data1ori.rda") to see the result.

7. As R integrates a large number of statistical methods and graphics
with a high-level language,
your work will imply creating a number of functions of your own. As soon as
your functions attain a certain "maturity" and you consider them of general use
for your own work, you should organize them as packages (see "Creating R packages" in
R-exts.pdf).

8. Meanwhile, it's also a good idea to save your functions into a different file,
or use that file as an intermediate step between the workspace and the library.
A good reason
to separate functions from other objects is that you might want to use a function
that you developped for another project.
Keeping functions and data objects in a different files will let you attach the
functions while avoiding the  data objects. Remember that you do not want to attach anything
that you do not need because it costs you memory.

The following function will let you list only the functions present in a given environment
(your workspace by default):
function (pos=1)
{
        a <- b <- ls(pos=pos)
        for (i in 1:length(a)) {
                b[i] <- mode(get(a[i]))
        }
        a[grep("function",b)]
}
[1] "disc.qda"           "edges"              "ima.explore2"
 [4] "imagen"             "imagenrgb"          "lsf"
 [7] "mat.select"         "no.na.mat"          "no.rep.mat"
[10] "parcelas.lda"       "parcelas.liss.func" "reclas"
[13] "rescale"            "utm2lincol"


You can use lsf() to save your functions to a file:
9. Actually, it's more usual to save functions in text format, which you can do with:
But you cannot use either load()  or attach() with files created by dump(). Instead,
you must use source()
but beware that source() will create the functions in your workspace. I've not found
any way to direct source() to another position.


10.Sometime wou will want to add an object from your workspace to an existing R disk file.
For example, you'll want to add a new function developped in your workspace to
the functions file of your project.  You just need the option append in dump() for this
purpose:
It's a bit more complicated to add  a data object to an R binary file, because
there is not an "append" option in save(). But you can use ls() in the following way:
[1] ".GlobalEnv"    "package:ctest" "Autoloads"     "package:base"
[1] ".GlobalEnv"         "file:lissN543cod.R" "package:ctest"
[4] "Autoloads"          "package:base"
[1] "lissN543.cod"  "lissN543E.cod" "lissN543W.cod"

Now, assuming we want to add an object "a" to  lissN543cod, we would type:
Note the "" in the list argument.

Once  lissN543cod_v2 is checked, we can delete lissN543cod.


11. In order to copy an object from the workspace to another environment,
you can use assign():
[1] ".GlobalEnv"         "file:lissN543cod.R" "package:ctest"
[4] "Autoloads"          "package:base"
[1] "lissN543.cod"  "lissN543E.cod" "lissN543W.cod"
[1] "a"             "lissN543.cod"  "lissN543E.cod" "lissN543W.cod"

You can delete a from the workspace, but beware that in such a case a will not be saved by save.image() or at
quiting R. You would need to use:
12. If you have several projects, you might forget what objects were in a given R binary file
created with save(). Unfortunately, I've not found any way to list the contents of such a file
unless it is attached or loaded. Also, selecting objects for loading from a R binary file seems
not possible.


Hope this notes are useful. Please send your comments, corrections etc. to alobo at ija.cisc.es
Note that R is a collaborative project, which also applies for documentation and guides!
#
Dear list,

Here is a feedback on "Getting your stuff organized in R".
I find the paper generally useful.
However, I would not encourage anybody to use .RData for the storage
of data, objects and results, because .RData is overwritten so often that a
little carelessness may easily cause loss of important data (R started from
wrong directory etc.). It is better to choose clear project-related
filenames for save.image, if there is important stuff (if not, why 
save.image?).

Best,
Christian

***********************************************************************
Christian Hennig
University of Hamburg, Faculty of Mathematics - SPST/ZMS
 (Schwerpunkt Mathematische Statistik und Stochastische Prozesse,
 Zentrum fuer Modellierung und Simulation)
Bundesstrasse 55, D-20146 Hamburg, Germany
Tel: x40/42838 4907, privat x40/631 62 79
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag.de


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
The Getting your stuff organised in R is very useful.

I wrote two sets of powerpoint slides (And more are being written) on
introduction to R.  They were used to teach my fellow Software Developers'
Klub (SDK) members R - who have lots of experience in programming but not
statistics; as well as some year one statistics students.

The slides can be obtained from
http://www.stat.auckland.ac.nz/~kwan022/pub/R/ , under the name:
   R_Tut_00.ppt
   R_Tut_01.ppt

In R_Tut_01.ppt I mentioned a Word document which has information on how one
can use R (Windows version) more efficiently - in my opinion.  This word
file has been zipped and named RTricks.zip in the same page.

Cheers,

Ko-Kang Wang

----------------------------------------------------------------------------
--
Ko-Kang Kevin Wang
Statistical Analysis Division Leader
Software Developers' Klub (SDK)
University of Auckland
New Zealand


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._