Skip to content

options(keep.source = TRUE) -- also for "library(.)" ?

7 messages · Peter Dalgaard, Kurt Hornik, Martin Maechler

#
help(options)

contains

    keep.source: When `TRUE', the default, the source code for functions
	      loaded by is stored in their `"source"' attribute, allowing
	      comments to be kept in the right places.  
	      This does not apply to functions loaded by `library'.
	      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and R behaves as documented, i.e., currently
all functions in package:base and all "interactively defined" functions
keep their source (__including__ comments) with them, whereas all the other
functions do not.

As I tend to create small own packages more and more, and ask others to use
them, users of the packages (and myself) are suffering increasingly from
function definitions with lost comments. 

Can we [those of us who know how sys.source() works...]
think of changing this?  As it was possible for the base package, it must
be doable for the others as well....

Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Martin Maechler <maechler@stat.math.ethz.ch> writes:
Martin, surely you could have figured out to remove these two lines
from sys.source:

    oop <- options(keep.source = FALSE)
    on.exit(options(oop))

The real question is whether we want to have a different mechanism for
controlling whether keep.source is set or not. Originally it was FALSE
for the base library to save space, and according the same setting was
used for other libraries since some of them are rather large, but
later it got flipped to TRUE for base, and then there is little point
in setting it FALSE for packages. Question is whether anyone would
want the old behaviour back to get more space for analyses?
#
PD> Martin Maechler <maechler@stat.math.ethz.ch> writes:
    >> Can we [those of us who know how sys.source() works...]  think of
    >> changing this?  As it was possible for the base package, it must be
    >> doable for the others as well....

    PD> Martin, surely you could have figured out to remove these two lines
    PD> from sys.source:

    PD>     oop <- options(keep.source = FALSE)
    PD>      on.exit(options(oop))

[blush... *BLUSH* ...
 I didn't look at sys.source(); 
 just knew that parts of it used to look rather "magical" to me ..
]

Of course we now could even make 
   keep.source = getOption("keep.source") 
an argument to library(), being propagated to sys.source(..).

    PD> The real question is whether we want to have a different mechanism
    PD> for controlling whether keep.source is set or not. 
right.

    PD> Originally it was FALSE for the base library to save space, and
    PD> according the same setting was used for other libraries since some
    PD> of them are rather large, but later it got flipped to TRUE for
    PD> base,
(yes, I'm still wondering...)
    PD> and then there is little point in setting it FALSE for packages. 
    PD> Question is whether anyone would want the old behaviour
    PD> back to get more space for analyses?

would be nice if it *was* configurable for base as well;
possibly both via cmd line option
	 (something like --keepsource / --no-keepsource )
and a setting in Rprofile..
for base...

Martin

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
(and I haven't seen more feedback..)
PD> Martin Maechler <maechler@stat.math.ethz.ch> writes:

	>>> Can we [those of us who know how sys.source() works...]  think of
	>>> changing this?  As it was possible for the base package, it must be
	>>> doable for the others as well....

     PD> Martin, surely you could have figured out to remove these two lines
     PD> from sys.source:

     PD> oop <- options(keep.source = FALSE)
     PD> on.exit(options(oop))

    MM> [blush... *BLUSH* ...
    MM> I didn't look at sys.source(); 
    MM> just knew that parts of it used to look rather "magical" to me ..
    MM> ]

    MM> Of course we now could even make 
    MM> keep.source = getOption("keep.source") 
    MM> an argument to library(), being propagated to sys.source(..).

I'm considering to commit the necessary changes and add the following to
NEWS [for "R-devel"]

    o	library(), require(), and sys.source() have a new argument
	` keep.source = getOption("keep.source") '.
	Hence, by default, functions from all packages (not just base)
	`keep their source'.

Is this okay for everyone ?


     PD> The real question is whether we want to have a different mechanism
     PD> for controlling whether keep.source is set or not. 

    MM> right.

      PD> Originally it was FALSE for the base library to save space, and
      PD> according the same setting was used for other libraries since some
      PD> of them are rather large, but later it got flipped to TRUE for
      PD> base,
    MM> (yes, I'm still wondering...)
      PD> and then there is little point in setting it FALSE for packages. 
      PD> Question is whether anyone would want the old behaviour
      PD> back to get more space for analyses?

    MM> would be nice if it *was* configurable for base as well;
    MM> possibly both via cmd line option
    MM> (something like --keepsource / --no-keepsource )
    MM> and a setting in Rprofile..

    >> From grepping through the source code, I don't see how it was turned off
    MM> for base...

anyone [R-core] ?

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
I'm replying to myself once more :

[and this gets more and more envolved, please "d" if you're not interested ..]
MM> (and I haven't seen more feedback..)
........

    MM> Of course we now could even make 
    MM>	   keep.source = getOption("keep.source") 
    MM> an argument to library(), being propagated to sys.source(..).

 MM>   I'm considering to commit the necessary changes and add the following to
 MM>    NEWS [for "R-devel"]
 MM> 
 MM>    o	library(), require(), and sys.source() have a new argument
 MM> 	` keep.source = getOption("keep.source") '.
 MM> 
 MM>   Hence, by default, functions from all packages (not just base)
 MM>   `keep their source'.
 MM> 
 MM>   Is this okay for everyone ?

Now, I still haven't committed the new code, but I have been using it
myself and made a "big picture statistic" *using* the new code, and gc()
for many packages (actually I've done this for all CRAN packages and more)
to find  how much memory is "spilled" by   keep.source = TRUE.

Here are the results :

I show the difference in memory usage {Vcells & Ncells, see ?gc & ?Memory}
for interesting packages, only using R builtin and CRAN (non-Devel) packages:

   Package    Bytes used
	      additionally with   Ncells  Vcells
	      keep.source= TRUE                 
				                
   nlme       2305'364             19023  107659   (actually  nlme + nls)
   survival5  1066'776		   8867    49792
   MASS        631'628		   5186    29507
   mclust      493'512		   4349    22936
   boot        456'944		   3833    21314
   ctest       309'288		   2406    14502
   ts          297'368		   2311    13944
   cluster     244'120		   2270    11298
   nls         236'668		   1871    11085
   wavethresh  218'624		   1878    10180
   mda         215'944		   1878    10046
   rpart       203'892		   1654     9533
   chron       194'640		   1735     9038
   tseries     183'360		   1505     8566
   locfit      176'416		   1632     8168
   tree        166'844		   1248     7843
   modreg      116'752		    989     5442
   nnet         98'124		    838     4571
   splines      85'112		    769     3948
   mva          79'280		    710     3680
   lqs          34'116		    292     1589
   eda          10'860		    105      501
   zmatrix       7'196		     82      327
   Devore5           0		      0        0  [took this to "test"

I.e., for the nlme() one needs an extra 2.3 MBytes of memory just for
"keep.source = TRUE".

I further investigated a bit how much the "keep.source" of base ``costs''
memory wise.
Note that I still don't know how to turn it off easily for base (Peter ?).

However, I just counted how much "source" is in base :
  > length(ob <- ls(pos= match("package:base",search()), all.nam = TRUE))
  [1] 1193
  > length(fns <- ob[sapply(ob, function(n)is.function(get(n)))])
  [1] 1169
  > stem(len.src <- sapply(fns, function(n)sum(nchar(attr(get(n),"source")))))

   The decimal point is 3 digit(s) to the right of the |
  
    0 | 00000000000000000000000000000000000000000000000000000000000000000000+980
    1 | 00000000000112222233333333334444555555666667777778888899
    2 | 00000012333334444555556666777788899
    3 | 12234477
    4 | 15669
    5 | 34
    6 | 
    7 | 35
    8 | 
    9 | 
   10 | 2
		(guess *which* is the  outlier  ;-)

  > sum(len.src)
  [1] 359964

i.e., only ~360'000 characters.

Now compare this with  survival5 which was scoring pretty high above :

  > library(survival5, keep.source = TRUE)
  > length(ob <- ls(pos= match("package:survival5",search()), all.nam = TRUE))
  [1] 117
  > length(fns <- ob[sapply(ob, function(n)is.function(get(n)))])
  [1] 116
  > stem(len.src <- sapply(fns, function(n)sum(nchar(attr(get(n),"source")))))

    The decimal point is 3 digit(s) to the right of the |

     0 | 00000000011111111111111112222222233344444455555789
     1 | 0001122334445567777899
     2 | 0001134555555789
     3 | 02233445567
     4 | 12368
     5 | 2478
     6 | 14799
     7 | 
     8 | 0
     9 | 
    10 | 
    11 | 
    12 | 
    13 | 
    14 | 
    15 | 4
    16 | 
    17 | 3

  > sum(len.src)
  [1] 235633

i.e.  about 2/3 of "base".

(but then base has "source" attributes for much more objects)
Very crude extrapolation would mean that turning off the "keep.source" for
"base" would save about 1.5 MBytes of RAM {I'd guess even more..}

After all this testing, I think what we really want is
"keep.source = FALSE" (including for "base" !)
WHEN working with large data, working on smallish machines,
    or for all "batch" processing.

Hence I'd propose

1.  
  options(keep.source = interactive())

  in the default profile

2. {as proposed earlier today -- see below}

  provide a command line option to turn it on or off.
------------

    PD> The real question is whether we want to have a different mechanism
    PD> for controlling whether keep.source is set or not. 

    MM> right.

    PD> Originally it was FALSE for the base library to save space, and
    PD> according the same setting was used for other libraries since some
    PD> of them are rather large, but later it got flipped to TRUE for
    PD> base,
    MM> (yes, I'm still wondering...)
    PD> and then there is little point in setting it FALSE for packages. 
    PD> Question is whether anyone would want the old behaviour
    PD> back to get more space for analyses?

  would be nice if it *was* configurable for base as well;
  possibly both via cmd line option
  (something like --keepsource / --no-keepsource )
  and a setting in Rprofile..

  MM> From grepping through the source code, I don't see how it was turned off
  MM> for base...

 anyone [R-core] ?

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
I am not sure whether we really want a command line option.  I'd say we
should keep things as simple as possible.

-k

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
These are the NEWS entries

    o	library(), require(), and sys.source() have a new argument
	` keep.source = getOption("keep.source") '.
	Hence, by default, functions from all packages (not just base)
	`keep their source'

    o	The "keep.source" option now defaults to interactive() instead of TRUE.


Note that this (the 2nd one!) speeded up  "make check" a bit (about 1-2 %).

What is *not* yet working is
     "keep.source = FALSE  having an effect on the functions in base.

Martin

PS:
 to get this latest "R-devel snapshot", either 
 use "rsync" (used to be updated hourly), or anonymous "cvs" (updated daily
 to cvs.r-project.org
 or FTP in about 8 hours from  ftp://stat.ethz.ch/Software/R/R-devel.tar.gz
 or get it from the CRAN mirrors  a few more hours later ...)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._