Skip to content

bzip2'ed data under data/

5 messages · Sébastien Bihorel, Brian Ripley

#
Hi,

R CMD check PACKAGE_VERSION_tar.gz gives warning:

Files not of a type allowed in a ?data? directory:
  ?tser1.csv.bz2? ?tser2.csv.bz2?
Please use e.g. ?inst/extdata? for non-R data files

which I didn't expect, based on section 1.1.5 (Data in packages) of the
Writing R Extensions manual:

Tables (`.tab', `.txt', or `.csv' files) can be compressed by
`gzip', `bzip2' or `xz', optionally with additional extension `.gz',
`.bz2' or `.xz'.  However, such files can only be used with R 2.10.0 or
later, and so the package should have an appropriate `Depends' entry in
its DESCRIPTION file.

In this case, I have a Depends: R (>= 2.13.0), and the package was built
with R version 2.15.0 beta (2012-03-16 r58769), Platform:
x86_64-pc-linux-gnu (64-bit), so I don't understand the warning.

Cheers,
1 day later
#
On 19/03/2012 20:25, Sebastian P. Luque wrote:
Well, the extension is allowed 'optionally' to be .csv.bz2, but that 
does not make it good practice and I would suggest not using it.

But that 'check' picked it up was a typo in the code 'check' used to 
specify types of data() files, corrected since your build of R so I 
would expect current R-devel or R-pre-release not to give the NOTE.  I 
am not sure whether or not that has any ramifications for users of the 
package with older versions of R, but we know calling the compressed 
file foo.csv would work.
4 days later
#
On Wed, 21 Mar 2012 18:35:15 +0000,
Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:

            
Does this mean we can still compress the files, but leave the file name
with suffix *.csv (i.e. not adding the compression-specific suffix)?
The 2 files I'm including in the package are a little over 1 Mb, and
bzip2 gets them down to < 150 Kb.
Up to R 2.14.2, R CMD check was reporting a Warning, rather than a Note,
but indeed the latest R-devel and R-pre-release don't.  Thanks.

Cheers,
#
On 26/03/2012 16:34, Sebastian P. Luque wrote:
Yes, that is what the help file says.

  
    
#
On Mon, 26 Mar 2012 16:44:58 +0100,
Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
[...]
If I do that, however, R CMD build builds the package adding the .gz
suffix to these files (i.e. I end up with data/*.csv.gz), presumably
because --resave-data is the default, which uses gzip, provided no
BuildResaveData field is present in DESCRIPTION.  Adding
"BuildResaveData: no" to DESCRIPTION solves this.  For package
maintenance it might be easier to leave the uncompressed data/*.csv and
add "BuildResaveData: bzip2" to DESCRIPTION, but then R CMD build
generates data/*.csv.xz with R-devel (2012-03-22 r58801), so something
seems wrong.  Should we simply stick to doing the compression manually
and adding "BuildResaveData: no" to DESCRIPTION?

Thanks,