R FAQ v0.60 - R-announce | R Mailing Lists

Tue, Dec 9, 1997 9:48 AM #
An updated version of the R FAQ to accompany the new 0.60 release is now
available at the usual site,

	http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

A plain text version of the FAQ is appended below.

-kh

****** snip snip snip **************************************************
  R FAQ
  Kurt Hornik
  v0.60-6, 1997/12/08

  This document contains answers to some of the most frequently asked
  questions about R.  Feedback is welcome.
  ______________________________________________________________________

  Table of Contents:

  1. Introduction

     1.1 Legalese
     1.2 Obtaining this Document
     1.3 Notation
     1.4 Feedback

  2. R Basics

     2.1 What Is R?
     2.2 What Machines Does R Run on?
     2.3 What Is the Current Version of R?
     2.4 How Can R Be Obtained?
     2.5 How Can R Be Installed?
        2.5.1 How Can R Be Installed (Unix)
        2.5.2 How Can R Be Installed (Windows)
        2.5.3 How Can R Be Installed (Macintosh)
     2.6 Are there Unix Binaries for R?
     2.7 Which Documentation Exists for R?
     2.8 Which Mailing Lists Exist for R?
     2.9 What is CRAN?

  3. R and S

     3.1 What Is S?
     3.2 What Is S-PLUS?
     3.3 What Are the Differences between R and S?
        3.3.1 Lexical Scoping
        3.3.2 Models
        3.3.3 Others

  4. R Add-On Packages

     4.1 Which Add-on Packages Exist for R?
     4.2 How Can Add-on Packages Be Installed?
     4.3 How Can Add-on Packages Be Used?
     4.4 How Can Add-on Packages Be Removed?
     4.5 How Can I Create an R Package?
     4.6 How Can I Contribute to R?

  5. R and Emacs

     5.1 Is there Emacs Support for R?
     5.2 Should I Run R from Within Emacs?

  6. R Miscellania

     6.1 How Can I Read a Large Data Set into R?
     6.2 Why Can't R Source a `Correct' File?
     6.3 How Can I Set Components of a List to NULL?
     6.4 How Can I Save My Workspace?
     6.5 How Can I Clean Up My Workspace?
     6.6 How Can I Get `eval' and `D' to Work?
     6.7 Why Do My Matrices Lose Dimensions?
     6.8 How Does Autoloading Work?
     6.9 How Should I Set Options?

  7. Acknowledgments

  ______________________________________________________________________

  1.  Introduction

  This document contains answers to some of the most frequently asked
  questions about R.

  1.1.  Legalese

  This document is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation; either version 2, or (at your option)
  any later version.

  This document is distributed in the hope that it will be useful, but
  WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  General Public License for more details.

  A copy of the GNU General Public License is available via WWW at
  http://www.gnu.org/copyleft/gpl.html.  You can also obtain it by
  writing to the Free Software Foundation, Inc., 675 Mass Ave,
  Cambridge, MA 02139, USA.

  1.2.  Obtaining this Document

  The latest version of this document is always available from

             http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

  From there, you can also obtain versions converted to plain ASCII
  text, GNU info, DVI, and PostScript, as well as the SGML source used
  for creating all these formats using the SGML-Tools (formerly
  Linuxdoc-SGML) system.

  1.3.  Notation

  Everything should be pretty standard.  `R>' is used for the R prompt,
  and a `$' for the shell prompt (where applicable).

  1.4.  Feedback

  Feedback is of course most welcome.

  In particular, note that I do not have access to Windows or Mac
  systems.  If you have information on these systems that you think
  should be added to this document, please let me know.

  2.  R Basics

  2.1.  What Is R?

  R is a system for statistical computation and graphics.  It consists
  of a language plus a run-time environment with graphics, a debugger,
  access to certain system functions, and the ability to run programs
  stored in script files.

  The design of R has been heavily influenced by two existing languages:
  Becker, Chambers & Wilks' S (see question ``What is S?'') and
  Sussman's Scheme.  Whereas the resulting language is very similar in
  appearance to S, the underlying implementation and semantics are
  derived from Scheme.  See question ``What Are the Differences between
  R and S?'' for a discussion of the differences between R and S.

  R was initially written by Robert Gentleman and Ross Ihaka, who are
  Senior Lecturers at the Department of Statistics of the University of
  Auckland in Auckland, New Zealand.  In addition, a large group of
  individuals has contributed to R by sending code and bug reports.

  Since mid-1997 there has been a core group who can modify the R source
  code CVS archive.  The group currently consists of Peter Dalgaard,
  Robert Gentleman, Kurt Hornik, Ross Ihaka, Thomas Lumley, Martin
  Maechler, Paul Murrell, Heiner Schwarte, and Luke Tierney.

  R has a home page at http://stat.auckland.ac.nz/r/r.html.  It is free
  software distributed under a GNU-style copyleft, and an official part
  of the GNU project (``GNU S'').

  2.2.  What Machines Does R Run on?

  R is being developed for the Unix, Windows and Mac platforms.

  R will configure and build under a number of common Unix platforms
  including dec-alpha-osf, freebsd, hpux, i386-linux (ELF), sgi-irix,
  solaris, and sunos, and according to Jim Lindsey <jlindsey at luc.ac.be>
  also on Mac, Amiga and Atari under m68k-linux.

  If you know about other platforms, please drop me a note.

  2.3.  What Is the Current Version of R?

  The current Unix version is 0.60, the previous version was 0.50.  The
  ``jump'' is due to both a major reorganization of the directory
  structure and the conversion to a new, TeX-like documentation format.
  See the file `CHANGES' in the R distribution for more information.

  With some good luck, the Windows version will soon catch up with the
  Unix version.  The version for the Mac is pre-alpha.

  2.4.  How Can R Be Obtained?

  Sources, binaries and documentation for R can be obtained via CRAN,
  the ``Comprehensive R Archive Network'' (see question ``What is
  CRAN?'').

  2.5.  How Can R Be Installed?

  2.5.1.  How Can R Be Installed (Unix)

  If binaries are available for your platform (see question ``Are there
  Unix Binaries for R?''), you can use these, following the instructions
  that come with them.

  Otherwise, you can compile and install R yourself, which can be done
  very easily under a number of common Unix platforms (see question
  ``What Machines Does R Run on?'').  The file INSTALL that comes with
  the R distribution contains instructions.

  Choose a place to install the R tree (R is not just a binary, but has
  additional data sets, help files, font metrics etc).  Let's call this
  place RHOME (given appropriate permissions, a natural choice would be
  `/usr/local/lib/R').  Untar the source code, and issue the following
  commands (at the shell prompt):

       $ ./configure
       $ make

  If these commands execute successfully, the R binary will be copied to
  the `$RHOME/bin' directory.  In addition, a shell script font-end
  called `R' will be created and copied to the same directory.  You can
  copy this script to a place where users can invoke it, for example to
  `/usr/local/bin'.  You could also copy the man page `R.1' to a place
  where your man reader finds it, such as `/usr/local/man/man1'.

  Using

       $ make docs

  will build preformatted plain text help pages as well as HTML and
  LaTeX versions of the documentation (the three kinds can also be gen-
  erated separately using make help, make html and make latex).  Note
  that as of R version 0.60, you need Perl version 5 to build the docu-
  mentation.  If this is not available on your system, you can obtain
  precompiled documentation files via CRAN.

  If everything (including docs) built properly (and you do not want to
  apply patches in the future), you can safely do rm -rf src to free
  disk space.

  2.5.2.  How Can R Be Installed (Windows)

  The file `rsept.zip' from the `bin/ms-windows' directory of a CRAN
  site contains a binary Windows 95 distribution for R which should be
  about a 0.50a4 release (plus a few features from 0.60).  This version
  is quite limited in Windows-specific features, although it has been
  reported to work rather nicely.

  The file `rseptbeta.zip' contains the same version with a few bugs
  fixed and some experimental code for dynamic loading of DLL files.
  The survival4 packages is included but it currently does not work.

  These versions also work on NT4.0, both server and workstation.

  The file `rsept31.zip' contains a version compiled for Windows 3.11.
  There have been mixed reports regarding this one, some get it going
  with a few inconsequential error messages on startup, others seem to
  be getting absolutely nowhere with it.  It will definitely not run
  without a version of Win32s installed, available free of charge from
  Microsoft (ftp://ftp.microsoft.com/Softlib/MSLFILES/pw1118.exe).  For
  reasons related to the lack of long filenames, the HTML help files
  cannot work and are not included.

  Note that when uncompressing the zip files, the pkunzip program needs
  to be invoked with the -D flag to create subdirectories.  Also, be
  aware that some decompression programs do not preserve long file names
  properly.

  2.5.3.  How Can R Be Installed (Macintosh)

  The CRAN `bin/macintosh' directory contains `R.sea.hqx', a binhexed
  self-extracting archive, and installation instructions in
  `README.MACINTOSH'.  Note that the version in it is nowhere near the
  quality of the current Unix version.

  The Power Macintosh port is temporarily on hold.

  2.6.  Are there Unix Binaries for R?

  Packages ready for installation under the i386 versions of Debian
  GNU/Linux and Red Hat Linux, respectively, can be found at CRAN in
  `bin/i386-linux'.  There are also `tar' distributions for NEXTSTEP on
  the i386 and m68k platforms in `bin/i386-nextstep' and `bin/m68k-
  nextstep'.  No others binary distributions have thus far been made
  publically available.

  2.7.  Which Documentation Exists for R?

  Online documentation for most of the functions and variables in R
  exists, and can be printed on-screen by typing help(name) (or ?name)
  at the R prompt, where name is the name of the topic help is sought
  for.  (In the case of unary and binary operators and control-flow
  special forms, the name may need to be be quoted.)

  This documentation can also be made available as HTML, and as hardcopy
  via LaTeX, see question ``How Can R Be Installed?''.  An up-to-date
  HTML version is always available for web browsing at

                  http://www.stat.math.ethz.ch/R/manual/

  An R manual (``Notes on R:  A Programming Environment for Data
  Analysis and Graphics'') is currently being written, based on the
  ``Notes on S-PLUS'' by Bill Venables <venables at stats.adelaide.edu.au>
  and David Smith <D.M.Smith at lancaster.ac.uk>.  The current version can
  be obtained as `Rnotes.tgz' (LaTeX source) in a CRAN `doc' directory.
  Note that the ``conversion'' from S(-PLUS) to R is not complete yet.

  Last, but not least, Ross' and Robert's experience in designing and
  implementing R is described in:

  @Article{,
    author =       {Ross Ihaka and Robert Gentleman},
    title =        {R: A Language for Data Analysis and Graphics},
    journal =      {Journal of Computational and Graphical Statistics},
    year =         1996,
    volume =       5,
    number =       3,
    pages =        {299--314}
  }

  This is also the reference for R to use in publications.

  2.8.  Which Mailing Lists Exist for R?

  Thanks to Martin Maechler <maechler at stat.math.ethz.ch>, there are
  three mailing lists devoted to R.

     r-announce
        This list is for announcements about the development of R and
        the availability of new code.

     r-devel
        This list is for discussions about the future of R and pre-
        testing of new versions.  It is meant for those who maintain an
        active position in the development of R.

     r-help
        The `main' R mailing list, for announcements about the
        development of R and the availability of new code, questions and
        answers about problems and solutions using R, enhancements and
        patches to the source code and documentation of R, comparison
        and compatibility with S and S-plus, and for the posting of nice
        examples and benchmarks.

  Note that the r-announce list is gatewayed into r-help, so you don't
  need to subscribe to both of them.

  To send a message to everyone on the r-help mailing list, send email
  to

                         r-help at stat.math.ethz.ch

  To subscribe (or unsubscribe) to this list send subscribe (or unsub-
  scribe) in the BODY of the message (not in the subject!)  to r-help-
  request at stat.math.ethz.ch.  Information about the list can be obtained
  by sending an email with info as its contens to r-help-
  request at stat.math.ethz.ch.

  Subscription and posting to the other lists is done analogously, with
  `r-help' replaced by `r-announce' and `r-devel', respectively.

  It is recommended that you send mail to r-help rather than only to the
  R developers (who are also subscribed to the list, of course).  This
  may save them precious time they can use for constantly improving R,
  and will typically also result in much quicker feedback for yourself.

  Of course, in the case of bug reports it would be very helpful to have
  code which reliably reproduces the problem.  Also, make sure that you
  include information on the system and version of R being used.

  Archives of the above three mailing lists are made available on the
  net in a monthly schedule at ftp://ftp.stat.math.ethz.ch/Mail-
  archives/ (which is a directory of mail archive files).  Archives of
  the r-help mailing list (including the previous r-testers lists back
  to March 1996), are also available in HTML format at
  http://www.ens.gu.edu.au/robertk/rhelp/about.htm.

  The developers of R can be reached for comments and reports at
  R at stat.auckland.ac.nz.

  2.9.  What is CRAN?

  The ``Comprehensive R Archive Network'' (CRAN) is a collection of
  sites which carry identical material, consisting of the R
  distribution(s), the contributed extensions, documentation for R, and
  binaries.

  The CRAN master site can be found at the URL

                http://www.ci.tuwien.ac.at/R/    (Austria)

  and is currently being mirrored daily at

       http://www.stat.unipg.it/pub/stat/statlib/R/CRAN/    (Italy)
       ftp://ftp.u-aizu.ac.jp/pub/lang/R/CRAN/              (Japan)
       ftp://ftp.stat.math.ethz.ch/R-CRAN/                  (Switzerland)
       http://lib.stat.cmu.edu/R/CRAN/                      (USA/Pennsylvania)
       ftp://ftp.biostat.washington.edu/mirrors/R/CRAN/     (USA/Washington)
       ftp://franz.stat.wisc.edu/pub/R/                     (USA/Wisconsin)

  Please use the CRAN site closest to you to reduce network load.

  The structure of the CRAN tree is as follows.

     `src/base'
        contains the official R distribution as provided by Ross Ihaka
        and Robert Gentleman.

     `src/contrib'
        contains code for extension packages.

     `doc'
        is for additional documentation and information on R.

     `bin'
        is for prebuilt R binaries (the base distribution and
        extensions), grouped according to platforms.  Currently, there
        are experimental `.deb' and `.rpm' packages for i386-linux, and
        tar files for i386-nextstep and m68k-nextstep.  I hope that
        `.tar.gz' files with contents relative to an installation tree
        (e.g. `bin', `lib/R/', and `man/man1/R.1') can be made available
        soon for all major supported Unix platforms.

  To ``submit'' something to CRAN, simply upload it to
  ftp://ftp.ci.tuwien.ac.at/incoming and send an email to
  <wwwadmin at ci.tuwien.ac.at>.  Please indicate the copyright situation
  (GPL, ...) in your submission.

  3.  R and S

  3.1.  What Is S?

  S is a very high level language and an environment for data analysis
  and graphics.  S was written by Richard A. Becker, John M. Chambers,
  and Allan R. Wilks of AT&T Bell Laboratories Statistics Research
  Department.

  The primary references for S are two books by the creators of S.

  o  Richard A. Becker, John M. Chambers and Allan R. Wilks (1988),
     ``The New S Language,'' Chapman & Hall, London.

     This book is often called the ``Blue Book''.

  o  John M. Chambers and Trevor J. Hastie (1992), ``Statistical Models
     in S,'' Chapman & Hall, London.

     This is also called the ``White Book''.

  There is a huge amount of user-contributed code for S, available at
  the S Repository at CMU.

  See the ``Frequently Asked Questions about S''
  (http://lib.stat.cmu.edu/S/faq) for further information about S.

  3.2.  What Is S-PLUS?

  S-PLUS is a value-added version of S sold by Statistical Sciences,
  Inc. (now a division of Mathsoft, Inc.)  S is a subset of S-PLUS, and
  hence anything which may be done in S may be done in S-PLUS.  In
  addition S-PLUS has extended functionality in a wide variety areas,
  including robust regression, modern nonparametric regression, time
  series, survival analysis, multivariate analysis, classical
  statistical tests, quality control, and graphics drivers.  Add-on
  modules add additional capabilities for wavelet analysis, spatial
  statistics, and design of experiments.

  See the MathSoft S-PLUS page (http://www.mathsoft.com/splus.html) for
  further information.

  3.3.  What Are the Differences between R and S?

  3.3.1.  Lexical Scoping

  Whereas the developers of R have tried to stick to the S language as
  defined in ``The New S Language'' (Blue Book, see question ``What is
  S?''), they have adopted the evaluation model of Scheme.

  This difference becomes manifest when free variables occur in a
  function.  Free variables are those which are neither formal
  parameters (occurring in the argument list of the function) nor local
  variables (created by assigning to them in the body of the function).
  Whereas S (like C) by default uses static scoping, R (like Scheme) has
  adopted lexical scoping.  This means the values of free variables are
  determined by a set of global variables in S, but in R by the bindings
  that were in effect at the time the function was created.

  Consider the following function:

       cube <- function(n) {
         sq <- function() n * n
         n * sq()
       }

  Under S, sq() does not ``know'' about the variable n unless it is
  defined globally:

       S> cube(2)
       Error in sq():  Object "n" not found
       Dumped
       S> n <- 3
       S> cube(2)
       [1] 18

  In R, the ``environment'' created when cube() was invoked is also
  looked in:

       R> cube(2)
       [1] 8

  The following more `realistic' example illustrating the differences in
  scoping is due to Thomas Lumley <thomas at biostat.washington.edu>.  The
  function

       jackknife.lm <- function(lmobj) {
         n <- length(resid(lmobj))
         jval <- t(apply(as.matrix(1:n), 1,
                         function(i) coef(update(lmobj, subset = -i))))
         (n - 1) * (n - 1) * var(jval) / n
       }

  does something useful in R, but does not work in S.  In order to make
  it work in S you need to explicitly pass the linear model object into
  the function nested in apply().  If you don't and you are lucky you
  will get ``Error: Object "lmobj" not found''.  If you are unlucky
  enough to have a linear model called lmobj in your global environment
  you will get the wrong answer with no warning.

  The following version works in S.

       jackknife.S.lm <- function(lmobj) {
         n <- length(resid(lmobj))
         jval <- t(apply(as.matrix(1:n), 1,
                         function(i, lmobj) coef(update(lmobj, subset = -i)),
                         lmobj = lmobj))
         (n - 1) * (n - 1) * var(jval) / n
       }

  (The S version was written independently by Thomas and at least three
  of his fellow students over the past couple of years, causing liter-
  ally hours of confusion on each occasion.)

  Similarly, most optimization (or zero-finding) routines need some
  arguments to be optimized over and have other parameters that depend
  on the data but are fixed with respect to optimization.  With R
  scoping rules, this is a trivial problem; simply make up the function
  with the required definitions in the same environment and scoping
  takes care of it.  With S, one solution is to add an extra parameter
  to the function and to the optimizer to pass in these extras, which
  however can only work if the optimizer supports this (and typically,
  the builtin ones do not).

  Lexical scoping allows using function closures and maintaining local
  state.  A simple example (taken from Abelson and Sussman) can be found
  in the `demos/language' subdirectory of the R distribution.  Further
  information is provided in the standard R reference ``R: A Language
  for Data Analysis and Graphics'' (see question ``Which Documentation
  Exists for R?'') and a paper on ``Lexical Scope and Statistical
  Computing'' by Robert Gentleman and Ross Ihaka which can be obtained
  from the `doc/misc' directory of a CRAN site.

  Lexical scoping also implies a further major difference.  Whereas S
  stores all objects as separate files in a directory somewhere (usually
  `.Data' under the current directory), R does not.  All objects in R
  are stored internally.  When R is started up it grabs a very large
  piece of memory and uses it to store the objects.  R performs its own
  memory management of this piece of memory.  Having everything in
  memory is necessary because it is not really possible to externally
  maintain all relevant ``environments'' of symbol/value pairs.  This
  difference also seems to make R much faster than S.

  The down side is that if R crashes you will lose all the work for the
  current session.  Saving and restoring the memory ``images'' (the
  functions and data stored in R's internal memory at any time) can be a
  bit slow, especially if they are big.  In S this does not happen,
  because everything is saved in disk files and if you crash nothing is
  likely to happen to them.  R is still in a beta stage, and may crash
  from time to time.  Hence, for important work you should consider
  saving often, see question ``How Can I Save My Workspace?'' (other
  possibilities are logging your sessions, or have your R commands
  stored in text files which can be read in using source()).  (Note that
  if you run R from within Emacs (see question ``R and Emacs''), you can
  save the contents of the interaction buffer to a file and conveniently
  manipulate it using ess-transcript-mode, as well as save source copies
  of all functions and data used.)

  3.3.2.  Models

  There are some differences in the modeling code, such as

  o  Whereas in S, you would use lm(y ~ x^3) to regress y on x^3 and
     lm(y ~ poly(x, 3)) to perform ``cubic'' regression, in R, you have
     to insulate powers of numeric vectors (using I()), i.e., you have
     to use lm(y ~ I(x^3)) and lm(y ~ x + I(x^2) + I(x^3)),
     respectively.

  o  The glm family objects are implemented differently in R and S.  The
     same functionality is available but the components have different
     names.

  o  terms objects are stored differently.  In S a terms object is an
     expression with attributes, in R it is a formula with attributes.
     The attributes have the same names but are mostly stored
     differently.  The major difference in functionality is that a terms
     object is subscriptable in S but not in R.  If you can't imagine
     why this would matter then you don't need to know.

     Also, attr(terms(y~x), "response") give 1 in S and TRUE in R.  In S
     the attribute indicates which column of the model frame will
     contain the response.  In R this always column 1.

  Finally, in R y~x+0 is an alternative to y~x-1 for specifying a model
  with no intercept.  Models with no parameters at all can be specified
  by y~0.

  3.3.3.  Others

  Apart from lexical scoping and its implications, R follows the S
  language definition in the Blue Book as much as possible, and hence
  really is an ``implementation'' of S.  There are some intentional
  differences where the behavior of S is considered ``not clean''.  In
  general, the rationale is that R should help you detect programming
  errors, while at the same time being as compatible as possible with S.

  Some known differences are the following.

  o  In R, if x is a list, then x[sub] <- NULL and x[[sub]] <- NULL
     remove the specified elements from x.  The first of these is
     incompatible with S, where it is a no-op.

  o  In S, the functions named .First and .Last in the `.Data' directory
     can be used for customizing, as they are executed at the very
     beginning and end of a session, respectively.

     R looks for files called `.Rprofile' in the user's home directory
     and the current directory, and sources these.  It also loads a
     saved image from `.RData' in case there is one.  If a .First()
     function exists then, it is executed.  The .Last mechanism is not
     supported yet.

  o  In R, the .First.lib mechanism when loading add-on packages using
     library() is not yet supported.

  o  In R, dyn.load() can only load shared libraries, as created for
     example by `R SHLIB'.

  o  R presently does not support IEEE Inf and NaN.

  o  Whereas in S, abs(z) is the same as Mod(z) for complex z, in R you
     must use Mod(z), since abs() is a function of real numbers only.

  o  In R, attach() currently only works for lists and data frames (not
     for directories).  Also, you cannot attach at position 1.

  o  Categories do not exist in R, and never will as they are deprecated
     now in S.  Use factors instead.

  o  In R, For() loops are not necessary and hence not supported.

  o  In R, assign() uses the argument envir= rather than where= as in S.

  o  The random number generators are different, and the seeds have
     different length.

  o  R uses only double precision and so can only pass numeric arguments
     to C/FORTRAN subroutines as double * or DOUBLE PRECISION,
     respectively.

  o  R does not allow indexing beyond the end of an array.  E.g., if x
     is a vector of length 5, both x[6] and x[-6] return an error
     (``subscript out of bounds'').  This is a feature, as the R
     developers feel that indexing beyond array bounds causes bugs in
     code that are hard to find and in lots of cases only subtly wrong,
     and typically manifest themselves when least needed.

     As another example, suppose that DF is a data frame and you want to
     add a new variable VAR named x to it.  In S, you can do DF[["x"]]
     <- VAR.  In R, this is not possible; you can use DF$"x" <- VAR or
     DF <- cbind(DF, x = VAR).

  o  R currently does not allow recycling when subscripting with
     logicals.  E.g., x <- 1:5; x[c(F, T)] currently gives an error.
     This is a bug and will be fixed soon.

  There are also differences which are not intentional, and result from
  missing or incorrect code in R.  The developers would appreciate
  hearing about any deficiencies you may find (in a written report fully
  documenting the difference as you see it).  Of course, it would be
  useful if you were to implement the change yourself and make sure it
  works.

  4.  R Add-On Packages

  4.1.  Which Add-on Packages Exist for R?

  The R distribution comes with the following extra packages:

     eda
        Exploratory Data Analysis.  Currently only contains functions
        for robust line fitting, and median polish and smoothing.

     mva
        Multivariate Analysis.  Currently contains code for principal
        components (prcomp), canonical correlations (cancor),
        hierarchichal clustering (hclust), and metric multidimensional
        scaling (cmdscale).  More functions for clustering and scaling,
        biplots, profile and star plots, and code for ``real''
        discriminant analysis will be added soon.
  The following packages are available from the CRAN `src/contrib' area.
  Note that R 0.60 has brought a change in both organization of package
  sources and documentation format, and that some of the packages below
  may not yet have been updated accordingly.

     acepack
        ace (Alternating Conditional Expectations) and avas (Additivity
        and VAriance Stabilization for regression) for selecting
        regression transformations.

     bootstrap
        Software (bootstrap, cross-validation, jackknife), data and
        errata for the book ``An Introduction to the Bootstrap'' by B.
        Efron and R. Tibshirani, 1993, Chapman and Hall.

     class
        Functions for classification (k-nearest neighbor and LVQ).

     clus
        Functions for cluster analysis.

     ctest
        A collection of classical tests, including the Bartlett, Fisher,
        Kruskal-Wallis, Kolmogorov-Smirnov, and Wilcoxon tests.

     date
        Functions for dealing with dates.  The most useful of them
        accepts a vector of input dates in any of the forms 8/30/53,
        30Aug53, 30 August 1953, ..., August 30 53, or any mixture of
        these.

     e1071
        Miscellaneous functions used at the Department of Statistics at
        TU Wien (E1071).

     fracdiff
        Maximum likelihood estimation of the parameters of a
        fractionally differenced ARIMA(p,d,q) model (Haslett and
        Raftery, Applied Statistics, 1989).

     gee
        An implementation of the Liang/Zeger generalized estimating
        equation approach to GLMs for dependent data.

     integrate
        Code for adaptive quadrature.

     jpn
        A function to plot Japan's coast-line and prefecture boundaries.

     leaps
        A package which performs an exhaustive search for the best
        subsets of a given set of potential regressors, using a branch-
        and-bound algorithm, and also performs searches using a number
        of less time-consuming techniques.

     mlbench
        A collection of artificial and real-world machine learning
        benchmark problems, including the Boston housing data.

     nnet
        Software for feed-forward neural networks with a single hidden
        layer and for multinomial log-linear models.

     oz Functions for plotting Australia's coastline and state
        boundaries.

     polynom
        A collection of functions to implement a class for univariate
        polynomial manipulations.

     ratetables
        US national and state mortality data (requires survival4 and
        date).

     rational
        A few small functions to find numerical rational approximations
        using a continued fraction method.

     snns
        An R interface to the Stuttgart Neural Networks Simulator
        (SNNS).

     splines
        Regression spline functions.

     survival4
        Functions for survival analysis (requires splines).

     wavethresh
        Code for doing wavelet transforms and thresholding in 1 and 2D.

     xgobi
        Interface to the XGobi program for graphical data analysis.

  See CRAN `src/contrib/INDEX' for more information.

  Paul Gilbert <pgilbert at bank-banque-canada.ca> will make an R version
  of his package DSE (Dynamic Systems Estimation) shortly after the 0.60
  release.  The package provides state-space models and the Kalman
  filter, VARMA and cointegration models, and numerical differentiation.
  Also, it can do various rational expectation models via an interface
  to run Troll (a commercially available product) from R.  According to
  Paul, the PADI interface from the Bank of Canada also works with minor
  changes.  PADI can be used to access Fame time series data bases and
  potentially other databases, even remotely over the Internet.  For
  further information see http://www.bank-banque-canada.ca/pgilbert.

  Harald Fekjaer <hfe at math.uio.no> has written addreg, a package for
  additive hazards regression, which can be obtained from
  http://www.med.uio.no/imb/stat/addreg/.

  More code has been posted to the r-help mailing list, and can be
  obtained from the mailing list archive.

  4.2.  How Can Add-on Packages Be Installed?

  (Unix only.)  The add-on packages on CRAN come as gzipped tar files.
  ``Unpack'' the package (in a directory that you may write to).  If you
  have GNU tar, you can use tar zxf name, otherwise you can do something
  like gunzip -c name | tar xf -.  Let pkg be the name of the directory
  thus created.  To install the package to the default R directory tree
  (the `library' subdirectory of `RHOME'), type

       $ R INSTALL pkg

  at the shell prompt.  To install to another tree (e.g., your private
  one), use

       $ R INSTALL pkg lib

  where lib gives the path to the library tree to install to.

  You can use several library trees of add-on packages.  The easiest way
  to tell R to use these is via the environment variable RLIBS which
  should be a colon-separated list of directories at which R library
  trees are rooted.  You do not have to specify the default tree in
  RLIBS.  E.g., to use a private tree in `$HOME/lib/R' and a public
  site-wide tree in `/usr/local/lib/R/site', put

       RLIBS="$HOME/lib/R:/usr/local/lib/R/site"; export RLIBS

  into your (Bourne) shell profile.

  4.3.  How Can Add-on Packages Be Used?

  To find out which additional packages are available on your system,
  type

       library()

  at the R prompt.

  This produces something like

  Packages in `/home/me/lib/R':

  mystuff      My own R functions, nicely packaged and not documented

  Packages in `/usr/local/lib/R/library':

  acepack      ace() and avas() for selecting regression transformations
  bootstrap    Functions for the book "An Introduction to the Bootstrap"
  ctest        Classical Tests
  date         Functions for handling dates
  eda          Exploratory Data Analysis
  fracdiff     Fractionally differenced ARIMA(p,d,q) models
  gee          Generalized Estimating Equation models
  mva          Classical Multivariate Analysis
  splines      Regression spline functions
  survival4    Survival analysis (needs `splines')

  You can ``load'' the installed package name by

       library(name)

  You can then find out which functions it provides by typing one of

       help(package = name)
       library(help = name)

  You can unload the loaded package name by

       detach("package:name")

  4.4.  How Can Add-on Packages Be Removed?

  To remove the package pkg from the default library or the library lib,
  do

       $ R REMOVE pkg

  or

       $ R REMOVE pkg lib

  respectively.

  4.5.  How Can I Create an R Package?

  A package consists of a subdirectory containing a `TITLE' and `INDEX'
  file, and subdirectories `R', `man' and optionally `src', `src-c', and
  `data'.

  The `TITLE' file contains a line giving the name of the package and a
  brief description.  `INDEX' contains a line for each sufficiently
  interesting object in the package, giving its name and a description
  (functions such as print methods not usually called explicitly might
  not be included).
  The `R' subdirectory contains R code files with names beginning with
  lowercase letters.  One of these should use library.dynam() to load
  any necessary compiled code.  The `man' subdirectory should contain R
  documentation files for the objects in the package.

  Source and a Makefile for the compiled code is in `src', and a pure C
  version of the source should be in `src-c'.  In the common case when
  all the source is in C it may be convenient to make one of these
  directories a symbolic link to the other.  The `Makefile' will be
  passed various machine-dependent compile and link flags, examples of
  which can be seen in the `eda' package.

  Finally, the `data' subdirectory is for additional data files the
  package makes available for loading using data().  Note that (at least
  currently) all such files are in fact R code files, and must have the
  extension `.R'.

  See the documentation for library() for more information.

  The web page http://www.biostat.washington.edu/~thomas/Rlib.html
  maintained by Thomas Lumley provides information on porting S packages
  to R.

  4.6.  How Can I Contribute to R?

  R is currently still in alpha (or pre-alpha) state, so simply using it
  and communicating problems is certainly of great value.

  One place where functionality is still missing is the modeling
  software as described in ``Statistical Models in S'' (see question
  ``What is S?'').  The functions

        add1 kappa alias labels drop1 proj

  are missing; many of these are interpreted functions so anyone that is
  bored and wants to have a go at implementing them it would be appreci-
  ated.  In addition, only linear and generalized linear models are cur-
  rently available, aov, gam, loess, tree, and the nonlinear modelling
  code are not there yet.

  See also the `PROJECTS' file in the top level R source directory.

  Many of the packages available at the Statlib S Repository might be
  worth porting to R.

  If you are interested in working on any of these projects, please
  notify Kurt Hornik.

  5.  R and Emacs

  5.1.  Is there Emacs Support for R?

  There is an Emacs-Lisp interface for interactive statistical
  programming and data analysis called ESS (``Emacs Speaks
  Statistics'').  Languages supported include: S dialects (S 3/4, S-PLUS
  3.x, and R), LispStat dialects (XLispStat, ViSta), and SAS.  Stata and
  SPSS dialect (SPSS, Fiasco) support is being examined for possible
  future implementation (a preliminary Stata mode is distributed).

  ESS grew out of the desire for bug fixes and extensions to S-mode-4.8
  (which was a GNU Emacs interface to S/S-PLUS version 3 only).  In
  particular, XEmacs support as well as extensions to incorporate R were
  desired.  In addition, with new modes being developed for R, Stata,
  and SAS, it was felt that providing for a unifying framework would
  eliminate differences in the user interface, as well as to provide for
  faster development of production tools and statistical analysis.  5.0
  has, for its guts, the basic framework from S-mode.  However, it has
  been cleaned, streamlined, brought closer to conformance as a standard
  GNU Emacs package, and redesigned for modularity and reuse.

  R support contains code for editing R source code (syntactic
  indentation and highlighting of source code, partial evaluations of
  code, loading and error-checking of code, and source code revision
  maintenance) and documentation (including sending examples to a
  running R process and previewing), interacting with an inferior R
  process from within Emacs (command-line editing, searchable command
  history, command-line completion of R object and file names, quick
  access to object and search lists, transcript recording, and an
  interface to the help system), and transcript manipulation (in
  particular for re-evaluating commands from transcript files).

  The latest versions of ESS are always available by WWW from

                   http://franz.stat.wisc.edu/pub/ESS/

  or ftp://franz.stat.wisc.edu/pub/ESS/, or via CRAN.  The HTML version
  of the documentation can be found at
  http://www.stat.math.ethz.ch/ESS/.

  ESS comes with detailed installation instructions.

  5.2.  Should I Run R from Within Emacs?

  Yes, definitely.  Inferior R mode provides a readline/history
  mechanism, object name completion, and syntax-based highlighting of
  the interaction buffer using Font Lock mode, as well as a very
  convenient interface to the R help system.

  Of course, it also integrates nicely with the mechanisms for editing R
  source using Emacs.  One can write code in one Emacs buffer and send
  whole or parts of it for execution to R; this is helpful for both data
  analysis and programming.  One can also seamlessly integrate with a
  revision control system, in order to maintain a log of changes in your
  programs and data, as well as to allow for the retrieval of past
  versions of the code.

  In addition, it allows you to keep a record of your session, which can
  also be used for error recovery through the use of the transcript
  mode.

  6.  R Miscellania

  6.1.  How Can I Read a Large Data Set into R?

  R (currently) uses a static memory model.  This means that when it
  starts up, it asks the operating system to reserve a fixed amount of
  memory for it.  The size of this chunk cannot be changed subsequently.
  Hence, it can happen that not enough memory was allocated.

  In these cases, you should restart R with more memory available, using
  the command line options -n and -v.  To understand these options, one
  needs to know that R maintains separate areas for fixed and variable
  sized objects.  The first of these is allocated as an array of ``cons
  cells'' (Lisp programmers will know what they are, others may think of
  them as the building blocks of the language itself, parse trees,
  etc.), and the second are thrown on a ``heap''.  The -n option can be
  used to specify the number of cons cells (each occupying 16 bytes)
  which R is to use (the default is 200000), and the -v option to
  specify the size of the vector heap in megabytes (the default is 2).
  Only integers are allowed for both options.

  E.g., to read in a table of 5000 observations on 40 numeric variables,
  R -v 6 should do.

  Note that the information on where to find vectors and strings on the
  heap is stored using cons cells.  Thus, it may also be necessary to
  allocate more space for cons cells in order to perform computations
  with very ``large'' variable-size objects.

  You can find out the current memory consumption (the proportion of
  heap and cons cells used) by typing gc() at the R prompt.  This may
  help you in finding out whether to increase -v or -n.  Note that
  following gcinfo(TRUE), automatic garbage collection always prints
  memory use statistics.

  When using read.table(), the memory requirements are in fact higher
  than anticipated, because the file is first read in as one long string
  which is then split again.  Use scan() if possible in case you run out
  of memory when reading in a large table.

  6.2.  Why Can't R Source a `Correct' File?

  R sometimes has problems parsing a file which does not end in a
  newline.  This can happen for example when Emacs is used for editing
  the file and next-line-add-newlines is set to nil.  To avoid the
  problem, either set require-final-newline to a non-nil value in one of
  your Emacs startup files, or make sure R-mode (see question ``Is there
  Emacs Support for R?'') is used for editing R source files (which
  locally ensures this setting).

  Earlier R versions had a similar problem when reading in data files,
  but this should have been taken care of now.

  6.3.  How Can I Set Components of a List to NULL?

  You can use

       x[i] <- list(NULL)

  to set component i of the list x to NULL, similarly for named compo-
  nents.  Do not set x[i] or x[[i]] to NULL, because this will remove
  the corresponding component from the list.

  For dropping the row names of a matrix x, it may be easier to use
  rownames(x) <- NULL, similarly for column names.

  6.4.  How Can I Save My Workspace?

  The expression

       save(list = ls(), file = ".RData")

  saves the objects in the currently active environment (typically the
  user's .GlobalEnv) to the file `.RData' in the R startup directory.

  6.5.  How Can I Clean Up My Workspace?

  To remove all objects in the currently active environment (typically
  the user's .GlobalEnv), you can do

       rm(list = ls())

  6.6.  How Can I Get `eval' and `D' to Work?

  Strange things will happen if you use eval(print(x), envir = e) or
  D(x^2, "x").  The first one will either tell you that "x" is not
  found, or print the value of the wrong x.  The other one will likely
  return zero if x exists, and an error otherwise.

  This is because in both cases, the first argument is evaluated in the
  calling environment first.  The result (which should be an object of
  mode `expression' or `call') is then evaluated or differentiated.
  What you (most likely) really want is obtained by ``quoting'' the
  first argument upon surrounding it with expression().  For example,

         R> D(expression(x^2),"x")
         2 * x

  Although this behavior may initially seem to be rather strange, is
  perfectly logical.  The ``intuitive'' behaviour could easily be
  implemented, but problems would arise whenever the expression is
  contained in a variable, passed as a parameter, or is the result of a
  function call.  Consider for instance the semantics in cases like

         D2 <- function(e, n) D(D(e, n), n)

  or

         g <- function(y) eval(substitute(y), sys.frame(sys.parent(n = 2)))
         g(a * b)

  See the help pages for more examples.

  6.7.  Why Do My Matrices Lose Dimensions?

  When a matrix with a single row or column is created by a subscripting
  operation, e.g., row <- mat[2, ], it is by default turned into a
  vector.  In a similar way if an array with dimension, say, 2x3x1x4 is
  created by subscripting it will be coerced into a 2x3x4 array, losing
  the unnecessary dimension.  After much discussion this has been
  determined to be a feature.

  To prevent this happening, add the option `drop = FALSE' to the
  subscripting. For example,

         rowmatrix <- mat[2, , drop = F]  # creates a row matrix
         colmatrix <- mat[, 2, drop = F]  # creates a column matrix
         a <- b[1, 1, 1, drop = F]        # creates a 1x1x1 array

  The `drop = F' option should be used defensively when programming.
  For example, the statement

         somerows <- mat[index, ]

  will return a vector rather than a matrix if index happens to have
  length 1, causing errors later in the code.  It should probably be
  rewritten as

         somerows <- mat[index, , drop = F]

  6.8.  How Does Autoloading Work?

  R has a special environment called `.AutoloadEnv'.  Using
  autoload(name, pkg), where name and pkg are strings giving the names
  of an object and the package containing it, stores some information in
  this environment.  When R tries to evaluate name, it loads the
  corresponding package pkg and reevaluates name in the new package's
  environment.

  Using this mechanism makes R behave as if the package was loaded, but
  does not occupy memory (yet).

  See the help page for autoload() for a very nice example.

  6.9.  How Should I Set Options?

  The function options() allows setting and examining a variety of
  global ``options'' which affect the way in which R computes and
  displays its results.  The variable .Options holds the current values
  of these options, but should never directly be assigned to unless you
  want to drive yourself crazy---simply pretend that it is a ``read-
  only'' variable.

  For example, given

       test1 <- function(x = pi, dig = 3) {
         oo <- options(digits = dig); on.exit(options(oo));
         cat(.Options$digits, x, "\n")
       }
       test2 <- function(x = pi, dig = 3) {
         .Options$digits <- dig
         cat(.Options$digits, x, "\n")
       }

  we obtain:

       R> test1()
       3 3.14
       R> test2()
       3 3.141593

  What is really used is the global value of .Options, and using
  options(OPT = VAL) correctly updates it.  Local copies of .Options,
  either in .GlobalEnv or in a function environment (frame), are just
  silently disregarded.

  7.  Acknowledgments

  Of course, many many thanks to Robert and Ross for the R system, and
  to the package writers and porters for adding to it.

  Special thanks go to Peter Dalgaard, Paul Gilbert, Fritz Leisch, Jim
  Lindsey, Thomas Lumley, Martin Maechler, Anthony Rossini, and Andreas
  Weingessel for their comments which helped me improve this FAQ.

  More to some soon ...

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-announce mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-announce-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._