help(options)
contains
keep.source: When `TRUE', the default, the source code for functions
loaded by is stored in their `"source"' attribute, allowing
comments to be kept in the right places.
This does not apply to functions loaded by `library'.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and R behaves as documented, i.e., currently
all functions in package:base and all "interactively defined" functions
keep their source (__including__ comments) with them, whereas all the other
functions do not.
As I tend to create small own packages more and more, and ask others to use
them, users of the packages (and myself) are suffering increasingly from
function definitions with lost comments.
Can we [those of us who know how sys.source() works...]
think of changing this? As it was possible for the base package, it must
be doable for the others as well....
Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
options(keep.source = TRUE) -- also for "library(.)" ?
7 messages · Peter Dalgaard, Kurt Hornik, Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch> writes:
Can we [those of us who know how sys.source() works...] think of changing this? As it was possible for the base package, it must be doable for the others as well....
Martin, surely you could have figured out to remove these two lines
from sys.source:
oop <- options(keep.source = FALSE)
on.exit(options(oop))
The real question is whether we want to have a different mechanism for
controlling whether keep.source is set or not. Originally it was FALSE
for the base library to save space, and according the same setting was
used for other libraries since some of them are rather large, but
later it got flipped to TRUE for base, and then there is little point
in setting it FALSE for packages. Question is whether anyone would
want the old behaviour back to get more space for analyses?
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
PD> Martin Maechler <maechler@stat.math.ethz.ch> writes:
>> Can we [those of us who know how sys.source() works...] think of
>> changing this? As it was possible for the base package, it must be
>> doable for the others as well....
PD> Martin, surely you could have figured out to remove these two lines
PD> from sys.source:
PD> oop <- options(keep.source = FALSE)
PD> on.exit(options(oop))
[blush... *BLUSH* ...
I didn't look at sys.source();
just knew that parts of it used to look rather "magical" to me ..
]
Of course we now could even make
keep.source = getOption("keep.source")
an argument to library(), being propagated to sys.source(..).
PD> The real question is whether we want to have a different mechanism
PD> for controlling whether keep.source is set or not.
right.
PD> Originally it was FALSE for the base library to save space, and
PD> according the same setting was used for other libraries since some
PD> of them are rather large, but later it got flipped to TRUE for
PD> base,
(yes, I'm still wondering...)
PD> and then there is little point in setting it FALSE for packages.
PD> Question is whether anyone would want the old behaviour
PD> back to get more space for analyses?
would be nice if it *was* configurable for base as well;
possibly both via cmd line option
(something like --keepsource / --no-keepsource )
and a setting in Rprofile..
From grepping through the source code, I don't see how it was turned off
for base... Martin -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:
(and I haven't seen more feedback..)
"PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
PD> Martin Maechler <maechler@stat.math.ethz.ch> writes:
>>> Can we [those of us who know how sys.source() works...] think of
>>> changing this? As it was possible for the base package, it must be
>>> doable for the others as well....
PD> Martin, surely you could have figured out to remove these two lines
PD> from sys.source:
PD> oop <- options(keep.source = FALSE)
PD> on.exit(options(oop))
MM> [blush... *BLUSH* ...
MM> I didn't look at sys.source();
MM> just knew that parts of it used to look rather "magical" to me ..
MM> ]
MM> Of course we now could even make
MM> keep.source = getOption("keep.source")
MM> an argument to library(), being propagated to sys.source(..).
I'm considering to commit the necessary changes and add the following to
NEWS [for "R-devel"]
o library(), require(), and sys.source() have a new argument
` keep.source = getOption("keep.source") '.
Hence, by default, functions from all packages (not just base)
`keep their source'.
Is this okay for everyone ?
PD> The real question is whether we want to have a different mechanism
PD> for controlling whether keep.source is set or not.
MM> right.
PD> Originally it was FALSE for the base library to save space, and
PD> according the same setting was used for other libraries since some
PD> of them are rather large, but later it got flipped to TRUE for
PD> base,
MM> (yes, I'm still wondering...)
PD> and then there is little point in setting it FALSE for packages.
PD> Question is whether anyone would want the old behaviour
PD> back to get more space for analyses?
MM> would be nice if it *was* configurable for base as well;
MM> possibly both via cmd line option
MM> (something like --keepsource / --no-keepsource )
MM> and a setting in Rprofile..
>> From grepping through the source code, I don't see how it was turned off
MM> for base...
anyone [R-core] ?
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO D10 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I'm replying to myself once more : [and this gets more and more envolved, please "d" if you're not interested ..]
"MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:
MM> (and I haven't seen more feedback..)
"PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
........
MM> Of course we now could even make
MM> keep.source = getOption("keep.source")
MM> an argument to library(), being propagated to sys.source(..).
MM> I'm considering to commit the necessary changes and add the following to
MM> NEWS [for "R-devel"]
MM>
MM> o library(), require(), and sys.source() have a new argument
MM> ` keep.source = getOption("keep.source") '.
MM>
MM> Hence, by default, functions from all packages (not just base)
MM> `keep their source'.
MM>
MM> Is this okay for everyone ?
Now, I still haven't committed the new code, but I have been using it
myself and made a "big picture statistic" *using* the new code, and gc()
for many packages (actually I've done this for all CRAN packages and more)
to find how much memory is "spilled" by keep.source = TRUE.
Here are the results :
I show the difference in memory usage {Vcells & Ncells, see ?gc & ?Memory}
for interesting packages, only using R builtin and CRAN (non-Devel) packages:
Package Bytes used
additionally with Ncells Vcells
keep.source= TRUE
nlme 2305'364 19023 107659 (actually nlme + nls)
survival5 1066'776 8867 49792
MASS 631'628 5186 29507
mclust 493'512 4349 22936
boot 456'944 3833 21314
ctest 309'288 2406 14502
ts 297'368 2311 13944
cluster 244'120 2270 11298
nls 236'668 1871 11085
wavethresh 218'624 1878 10180
mda 215'944 1878 10046
rpart 203'892 1654 9533
chron 194'640 1735 9038
tseries 183'360 1505 8566
locfit 176'416 1632 8168
tree 166'844 1248 7843
modreg 116'752 989 5442
nnet 98'124 838 4571
splines 85'112 769 3948
mva 79'280 710 3680
lqs 34'116 292 1589
eda 10'860 105 501
zmatrix 7'196 82 327
Devore5 0 0 0 [took this to "test"
I.e., for the nlme() one needs an extra 2.3 MBytes of memory just for
"keep.source = TRUE".
I further investigated a bit how much the "keep.source" of base ``costs''
memory wise.
Note that I still don't know how to turn it off easily for base (Peter ?).
However, I just counted how much "source" is in base :
> length(ob <- ls(pos= match("package:base",search()), all.nam = TRUE))
[1] 1193
> length(fns <- ob[sapply(ob, function(n)is.function(get(n)))])
[1] 1169
> stem(len.src <- sapply(fns, function(n)sum(nchar(attr(get(n),"source")))))
The decimal point is 3 digit(s) to the right of the |
0 | 00000000000000000000000000000000000000000000000000000000000000000000+980
1 | 00000000000112222233333333334444555555666667777778888899
2 | 00000012333334444555556666777788899
3 | 12234477
4 | 15669
5 | 34
6 |
7 | 35
8 |
9 |
10 | 2
(guess *which* is the outlier ;-)
> sum(len.src)
[1] 359964
i.e., only ~360'000 characters.
Now compare this with survival5 which was scoring pretty high above :
> library(survival5, keep.source = TRUE)
> length(ob <- ls(pos= match("package:survival5",search()), all.nam = TRUE))
[1] 117
> length(fns <- ob[sapply(ob, function(n)is.function(get(n)))])
[1] 116
> stem(len.src <- sapply(fns, function(n)sum(nchar(attr(get(n),"source")))))
The decimal point is 3 digit(s) to the right of the |
0 | 00000000011111111111111112222222233344444455555789
1 | 0001122334445567777899
2 | 0001134555555789
3 | 02233445567
4 | 12368
5 | 2478
6 | 14799
7 |
8 | 0
9 |
10 |
11 |
12 |
13 |
14 |
15 | 4
16 |
17 | 3
> sum(len.src)
[1] 235633
i.e. about 2/3 of "base".
(but then base has "source" attributes for much more objects)
Very crude extrapolation would mean that turning off the "keep.source" for
"base" would save about 1.5 MBytes of RAM {I'd guess even more..}
After all this testing, I think what we really want is
"keep.source = FALSE" (including for "base" !)
WHEN working with large data, working on smallish machines,
or for all "batch" processing.
Hence I'd propose
1.
options(keep.source = interactive())
in the default profile
2. {as proposed earlier today -- see below}
provide a command line option to turn it on or off.
------------
PD> The real question is whether we want to have a different mechanism
PD> for controlling whether keep.source is set or not.
MM> right.
PD> Originally it was FALSE for the base library to save space, and
PD> according the same setting was used for other libraries since some
PD> of them are rather large, but later it got flipped to TRUE for
PD> base,
MM> (yes, I'm still wondering...)
PD> and then there is little point in setting it FALSE for packages.
PD> Question is whether anyone would want the old behaviour
PD> back to get more space for analyses?
would be nice if it *was* configurable for base as well;
possibly both via cmd line option
(something like --keepsource / --no-keepsource )
and a setting in Rprofile..
MM> From grepping through the source code, I don't see how it was turned off
MM> for base...
anyone [R-core] ?
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Martin Maechler writes:
I'm replying to myself once more : [and this gets more and more envolved, please "d" if you're not interested ..]
...
After all this testing, I think what we really want is "keep.source = FALSE" (including for "base" !) WHEN working with large data, working on smallish machines, or for all "batch" processing.
Hence I'd propose
1. options(keep.source = interactive())
in the default profile
2. {as proposed earlier today -- see below}
provide a command line option to turn it on or off.
I am not sure whether we really want a command line option. I'd say we should keep things as simple as possible. -k -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
These are the NEWS entries
o library(), require(), and sys.source() have a new argument
` keep.source = getOption("keep.source") '.
Hence, by default, functions from all packages (not just base)
`keep their source'
o The "keep.source" option now defaults to interactive() instead of TRUE.
Note that this (the 2nd one!) speeded up "make check" a bit (about 1-2 %).
What is *not* yet working is
"keep.source = FALSE having an effect on the functions in base.
Martin
PS:
to get this latest "R-devel snapshot", either
use "rsync" (used to be updated hourly), or anonymous "cvs" (updated daily
to cvs.r-project.org
or FTP in about 8 hours from ftp://stat.ethz.ch/Software/R/R-devel.tar.gz
or get it from the CRAN mirrors a few more hours later ...)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._