What's data() for?
On Fri, 14 May 2010, Duncan Murdoch wrote:
On 14/05/2010 5:35 AM, (Ted Harding) wrote:
On 13-May-10 23:43:58, yjmha69 wrote:
Hi there,
library(faraway) pima
pregnant glucose diastolic triceps insulin bmi diabetes age test 1 6 148 72 35 0 33.6 0.627 50 1 2 1 85 66 29 0 26.6 0.351 31 0
data(pima) pima
pregnant glucose diastolic triceps insulin bmi diabetes age test 1 6 148 72 35 0 33.6 0.627 50 1 2 1 85 66 29 0 26.6 0.351 31 0 As you can see, I can already use pima without running data(pima), after running data(pima), it looks the same. So what's the reason to use data(pima) ? Thanks YJM
The difference is that data(pima) will load the dataset pima (which can be found in the package "faraway") without the use of library(faraway). It won't load anything else from faraway.
That won't work. Unless you attach faraway, R won't know what "pima" refers to, and will just give an error.
But
data("pima", package="faraway")
will. And if you do that you can rm(pima); gc() and completely remove
the object from the session, something you cannot do with lazy-loading
of data.
That is I think the main attraction of not using lazy-loading for
datasets that will be used for only a small part of a session.
The difference between data(pima) and pima is that, in this case, there isn't really much of one, but in other cases there might be. Prior to the introduction of lazy loading of data, it always made a difference: the pima object wouldn't be loaded into memory until requested by data(pima). With lazy loading, a stub for the object is always in memory, with the main part of the object only loaded on first use. Many packages (including faraway) use lazy loading of data so data() is to some extent unnecessary: but there are some circumstances under which lazy loading won't work, so a few packages don't use it, and I believe it is not the default. Duncan Murdoch
When you use library(faraway) you will load everything in the package faraway, including of course the dataset pima (which is why you see no difference, since that dataset is the same whichever way you load it). So with data() you put less load on your system, and also avoid possible conflicts between what you already have in your environment and what would be brought in when you do library(faraway). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 14-May-10 Time: 10:35:15 ------------------------------ XFMail ------------------------------
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595