Skip to content

CRAN package sizes

6 messages · Brian Ripley, Yihui Xie, Kevin Coombes +1 more

#
Robin Hankin's post reminded me to post about the following recent 
addition to 'Writing R Extensions', in the section on 'Submitting a 
package to CRAN'

   Ensure that the package sources are not unnecessarily large. ...
   As a general rule, doc directories should not exceed 5Mb, and
   where data directories need to be 10Mb or more, consideration should
   be given to a separate package containing just the data. (Similarly
   for external data directories, large jar files and other libraries
   that need to be installed.)

With 2800 packages on CRAN, overall size is becoming a concern and 
currently to install all of CRAN takes 4Gb.  As the attached (I hope) 
graph shows, the 20 packages over 20Mb take a quarter, and those over 
5Mb take half.  (And this is after we have removed 100Mb from the 
largest installed package by re-compression, and archived the second 
largest, so Robin's package is currently the largest.)  Some of the 
largest packages are data/jar packages, but there are 55 packages with 
'doc' directories over 5Mb.  To put that in perspective, PDFs of whole 
books with lots of figures (MASS, Paul's R Graphics) are well under 
5Mb.

R CMD check in R-devel reports on large packages, and expect in future 
that submitted package sizes will be questioned more often.

There are lots of different reasons why doc directories are large, but 
the major ones are

- installing files that are unneeded, such as Rplots.pdf and .eps
   figures.
- using PDF figures of images where PNG would be more appropriate.
- including less than relevant material (such as how to install R,
   with screenshots!)

There are several ways to reduce the sizes of PDFs with no loss in 
quality, e.g. Adobe Acrobat Standard/Pro.
#
Regarding the reasons that make the doc directory large, I wonder if
we can make some changes in R:

1. Use a null graphics device as the default device rather than pdf()
when running Sweave -- this can avoid the useless Rplots.pdf:

options(device = function(...) {
    .Call("R_GD_nullDevice", PACKAGE = "grDevices")
})

This can save some time in building the vignette(s) as well. (see
http://yihui.name/en/?p=673)

However, this undocumented null device may not work for certain
graphics. Here is an example that it fails for ggplot2:
http://stackoverflow.com/questions/4692974/ggplot2-code-that-works-interactively-rkward-crashes-under-lyx-pgfsweave-hint/4707745#4707745

Is it possible for someone to look into the null device (Dr Murrell?)
to make it stable enough?

2. Compress the PDF graphics and vignettes using third-party tools,
among which I recommend qpdf (it's free).

qpdf --stream-data=compress input.pdf output.pdf

This can reduce the size of PDF files a lot without quality loss. I'm
using this tool in the animation package to reduce the size of PDF
animations.

3. Sorry I bring up this issue again, but I don't understand why
Sweave could not implement the png() device along with pdf() and
postscript(). I'm willing to provide a patch if needed.

Thanks!

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Sun, Feb 13, 2011 at 6:30 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
#
I think it would be even more useful if we could get Sweave to easily 
produce PNG figures instead of just PDF/EPS.  In the current state of 
things, making PNG versions is more cumbersome than making PDF versions, 
so I'm not surprised that most people don't go to that trouble most of 
the time.

I also know (from searching the archives when I wanted to try this 
myself) that a couple of people have, in the past, modified Sweave so it 
can generate PNG automatically.  However, the changes have never 
migrated into the released version.  Perhaps the space constraints at 
CRAN can convince Freidrich Leisch that the change would be a good idea....

     Kevin
On 2/13/2011 3:02 PM, Yihui Xie wrote:
#
On Sun, 13 Feb 2011, Yihui Xie wrote:

            
'we' cannot: only core developers can.  However, end users can 
contribute in many other ways: see below.
I don't see a bug report on that, and a patch would help expedite 
this.
*Can*, but I did say

   'There are several ways to reduce the sizes of PDFs with no loss in
    quality, e.g. Adobe Acrobat Standard/Pro.'

and qpdf is often ineffective (or worse), e.g. on package mokken.  The 
problem is that many of the large packages need images re-saved in 
some other format (or preferably re-generated in some other format).

I've added a --compact-vignettes option to R CMD build (in R-devel). 
At present it uses qpdf, but I will look out for better/additional 
options.  (I use Acrobat 9 Pro on my Mac and that has always beaten 
qpdf, often by a large margin.  But qpdf is perhaps the most readily 
available of these tools.)
Does it need changes to R?  I believe that it just needs a 
different driver, something which could be provided in a package.

This has been raised several times (including recently) with the 
Sweave maintainer, so maybe it will happpen eventually.  But a package 
would retrofit it to eariier versions of R.

  
    
#
Also I started doing my homework with regards to package size, and that is 
mainly cleaning leftovers from vignette generation and compressing the pdfs.

For most of my vignettes, ghostscript (lossy) compression works very well:
I use the /screen settings and -dDownsampleColorImages=false
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE 
-dQUIET -dBATCH -dDownsampleColorImages=false -dAutoRotatePages=/None
(DownsampleColorImages=false is important as I found otherwise that some .png 
become completely useless. However, the pngs are saved with carefully determined 
size and are pngs because the pdfs were too large: so I know that the bitmap 
images are already "size-optimized")
I wrote a inst/doc/makefile to do this and also clean up a few more "leftovers 
from the vignette".

BTW: while compressing the final .pdf achieves better total compression, it 
already helps a lot to compress the .pdf figures which can be done at the end of 
the .Rnw.

qpdf didn't help for my vignettes.


One question remains, though. I have two vignettes, where I cannot put the 
original data into the package (the very first thing in the vignette is the link 
to a zip file on r-forge that contains everything needed to reproduce the 
vignette, though. I think this is accessible enough for FOSS).
I'd like to have these documents accessible via the usual vignette () mechanism 
(this question has come up before, but I found only that the 00Index.dcf does 
not work any longer).
My second thought was to set up the Makefile so that instead of building the pdf 
a message is printed and the available pdf is used.
This does not work, however: buildVignettes (which I guess does the work*) first 
Sweaves the .Rnw file and then replaces the texi2dvi () call by make.
Is this intended behaviour? If so, how do I make my vignette accessible 
[obviously the "dummy .Rnw that includes the pdf"-technique doesn't look quite 
appropriate as it leads to unnecessarily large package size]?

*I did not realise this from the Makefile discussion in the extensions manual 
(nor does the help page of buildVignettes mention anything about this). Also, 
I'd appreciate very much if the extension manual would mention buildVignettes - 
it took me quite a while to find out what code is used and why my Makefile 
didn't lead to the desired results.

Thanks a lot for any ideas,

Claudia
#
Please excuse the noise about dummy .Rnw.
On 02/15/2011 03:44 PM, Claudia Beleites wrote:
coffe break was helpful: of course I just need a dummy .Rnw that is processed to 
a .tex but (via Makefile) _not_ to pdf...
Sorry, Claudia