An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120816/76d7a3d2/attachment.pl>
Compressing data for package builds
5 messages · steven mosher, Simon Urbanek, Uwe Ligges
On Aug 16, 2012, at 5:08 PM, steven mosher wrote:
Hi,
I have two .rda files that I need to include in a package. I've placed
them both in a data directory
after save() the are around 150Kb each.
When I try to check the package I get the following warning
Warning: large data file(s) saved inefficiently:
size ASCII compress
zagoskin.rda 137Kb FALSE none
Note: significantly better compression could be obtained
by using R CMD build --resave-data
old_size new_size compress
modpoll.rda 124Kb 78Kb xz
zagoskin.rda 137Kb 6Kb bzip2
Both of these files modpoll.rda and zagoskin.rda have already been
compressed from megabytes down to Kb.
Also,, the instructions "R CMD build --resave-data" doesnt do anything
that I can see so I must be using it wrong.
R CMD build is how you preferably should be creating your package tar ball, so you simply add the --resave-data argument to your already existing R CMD build call which creates the tar ball from your source directory. So can you elaborate on "doesn't do anything I can see"? In what sense? No output? No compression? Cheers, Simon
Is there a piece of the puzzle I am missing or instructions better than these: I tried LazyDataCompression and my data.rdb is 90Kb. "Package *tools* has a couple of functions to help with data images: checkRdaFiles reports on the way the image was saved, and resaveRdaFiles will re-save with a different type of compression, including choosing the best type for that particular image. Some packages using ?LazyData? will benefit from using a form of compression other than gzip in the installed lazy-loading database. This can be selected by the --data-compress option to R CMD INSTALL or by using the ?LazyDataCompression? field in the DESCRIPTION file. Useful values are bzip2, xz and the default, gzip. The only way to discover which is best is to try them all and look at the size of the pkgname/data/Rdata.rdb file." [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120816/60ad73c5/attachment.pl>
On 17.08.2012 07:24, steven mosher wrote:
" R CMD build is how you preferably should be creating your package tar
ball, so you simply add the --resave-data argument to your already existing
R CMD build call which creates the tar ball from your source directory. So
can you elaborate on "doesn't do anything I can see"? In what sense? No
output? No compression? "
my tarball builds with > R CDM build mattools
where mattools is the name of the package. and I get a warning on R CMD
check.
Things I tried
R CMD build --resave-data
R CMD build mattools --resave-data
R CMD build --resave-data mattools
The first does nothing, the second fails on unknown options and the third
fails on unknown options. So I found the help for R CMD
Now that I figured out how to display help for R CMD build I see that
--resave-data must include a specification of the type of compression
--resave-data="best" for example
I ran that. and got the same error indicating that the rda file had not
been compressed.
checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... WARNING
Warning: large data file(s) saved inefficiently:
size ASCII compress
zagoskin.rda 137Kb FALSE none
Note: significantly better compression could be obtained
by using R CMD build --resave-data
old_size new_size compress
modpoll.rda 124Kb 78Kb xz
zagoskin.rda 137Kb 6Kb bzip2
Building under windows so I wonder if I am missing a system file required
to do the compression.
Are you checking the tarball (as recommended) or the source dir? The compressed versions are in the tarball. The source dir is not changed. Uwe Liges
On Thu, Aug 16, 2012 at 5:48 PM, Simon Urbanek <simon.urbanek at r-project.org>wrote:
On Aug 16, 2012, at 5:08 PM, steven mosher wrote:
Hi,
I have two .rda files that I need to include in a package. I've placed
them both in a data directory
after save() the are around 150Kb each.
When I try to check the package I get the following warning
Warning: large data file(s) saved inefficiently:
size ASCII compress
zagoskin.rda 137Kb FALSE none
Note: significantly better compression could be obtained
by using R CMD build --resave-data
old_size new_size compress
modpoll.rda 124Kb 78Kb xz
zagoskin.rda 137Kb 6Kb bzip2
Both of these files modpoll.rda and zagoskin.rda have already been
compressed from megabytes down to Kb.
Also,, the instructions "R CMD build --resave-data" doesnt do
anything
that I can see so I must be using it wrong.
R CMD build is how you preferably should be creating your package tar ball, so you simply add the --resave-data argument to your already existing R CMD build call which creates the tar ball from your source directory. So can you elaborate on "doesn't do anything I can see"? In what sense? No output? No compression? Cheers, Simon
Is there a piece of the puzzle I am missing or instructions better than these: I tried LazyDataCompression and my data.rdb is 90Kb. "Package *tools* has a couple of functions to help with data images: checkRdaFiles reports on the way the image was saved, and resaveRdaFiles
will
re-save with a different type of compression, including choosing the best type for that particular image. Some packages using ?LazyData? will benefit from using a form of compression other than gzip in the installed lazy-loading database. This can be selected by the --data-compress option to R CMD INSTALL or by
using
the ?LazyDataCompression? field in the DESCRIPTION file. Useful values
are
bzip2, xz and the default, gzip. The only way to discover which is best
is
to try them all and look at the size of the pkgname/data/Rdata.rdb file."
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120817/63159ea8/attachment.pl>