Skip to content

Best way to handle dependency on non-CRAN package / large data package?

6 messages · Dirk Eddelbuettel, arilamstein at gmail.com

#
I have just written a package called choroplethrZip
<https://github.com/arilamstein/choroplethrZip> which contains a shapefile
and metadata on US Zip codes. It is currently hosted on github, has a
tagged version number (v1.0.0) and passes R CMD check as verified by
Travis. My plan is to use this in the next version of my package choroplethr
<https://github.com/arilamstein/choroplethr>.

This is exactly what I have done in the past with other map/data packages
(notably choroplethrMaps <https://github.com/arilamstein/choroplethrMaps>
 and choroplethrAdmin1 <https://github.com/arilamstein/choroplethrAdmin1>),
and is the architecture that CRAN requested: large data in a separate
package, listing it in the 'Suggests', and putting code like this where
appropriate:

if (!requireNamespace("choroplethrAdmin1", quietly = TRUE)) {
  stop("Package choroplethrAdmin1 is needed for this function to work.
Please install it.", call. = FALSE)
}

The problem I now face is that choroplethrZip is too large to be hosted on
CRAN (~75MB), and I am unclear on the best way to manage this dependency.
Presumably I could just change the above message to say

Please install choropltherZip by typing:
    library(devtools)
    install_github('arilamstein/choroplethr at v1.0.0')

But I don't know if this is the best way to do this, or if there is
anything else to consider. I have never had to manage package dependencies
outside of CRAN, and have always thought of CRAN as being a "closed
ecosystem", where there were not any dependencies outside of CRAN.

Can anyone provide guidance on this?

Thanks.

Ari
#
On 12 March 2015 at 08:41, arilamstein at gmail.com wrote:
| But I don't know if this is the best way to do this, or if there is
| anything else to consider. I have never had to manage package dependencies
| outside of CRAN, and have always thought of CRAN as being a "closed
| ecosystem", where there were not any dependencies outside of CRAN.
| 
| Can anyone provide guidance on this?

drat can help with this problem. Have a look at 

     http://dirk.eddelbuettel.com/code/drat.html

as well as my blog and the GitHub repo of drat.

In a nutshell, it creates repositories you can access via update.packages()
and install.packages() as if they were CRAN or BioC.  It also uses GitHub to
automagically provide a repository server via the webserverd "embedded" in
each GitHub repo (and turned on as soon as you use the gh-pages branch).

Some package authors have turned to using drat to distribute packages (often
in addition to CRAN, you can also do it instead of CRAN given a constraint as
here).  One such package author and I are working on another short blog post
detailing just this.  If you want, I can send you an 'informal preview' as
yet another source of documentation.

Dirk
#
Thanks Dirk. I'm looking at it now.

At first glance your documentation brings up a good limitation of simply
telling users to type "devtools::install_github()". Namely, what happens
when the census bureau updates their shapefiles, and I subsequently decide
to update the package? Or if I discover an error in the package and decide
to update it? The choroplethr package could have a dependency, and it's not
clear how to make that dependency explicit to the user.
On Thu, Mar 12, 2015 at 9:22 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
  
#
On 12 March 2015 at 09:40, arilamstein at gmail.com wrote:
| Thanks Dirk. I'm looking at it now.?
| 
| At first glance your documentation brings up a good limitation of simply
| telling users to type "devtools::install_github()". Namely, what happens when
| the census bureau updates their shapefiles, and I subsequently decide to update
| the package? Or if I discover an error in the package and decide to update it?
| The choroplethr package could have a dependency, and it's not clear how to make
| that dependency explicit to the user.?

100% agree. 

In writing drat, and talking to R users about it, I surprisingly often find
many (advanced) R users who seem to not use update.packages() at all.

R itself has your problem solved by providing repositries. And drat makes
creating and filling repositories (the author side) vey easy -- and that we
also aid the user side as installation as well as regular updates fall back
onto standard R functions: install.packages(), update.packages().  And this
does not require any additinal or manual steps on the part of the users (once
drat:::add(...)  has been added to their startup files).

So for this example, you could add a versioned Depends: in the
shapefile-using package and update the drat repository with an updated
shapefile package.  Users of drat and update.packages() would get updates
automagically. 

Dirk
#
Hi Dirk,

I'm interested in pursing this but I haven't been able to figure how to to
make it work.  Here's what I have so far:

install.packages("drat")
library(drat)
addRepo("arilamstein")

I (obviously) have a copy of the choroplethrZip github repo locally. I
typed:

git checkout gh-pages
git push

I gather that this is what I needed to to do make the repo web-accessible.
Although I found this step a bit confusing because it looks like github has
a pages feature for both accounts and projects.

And now the instructions say to type something like:
Error in insertPackage("~/Desktop/choroplethrZip_1.0.0.tar.gz") :
  Directory ~/git/drat not found
FALSE

The file I provided is what was created by calling devtools::build() on my
R package that I want to distribute.

Can you explain this error message to me and tell me what I'm doing wrong?
I've read several of the documents on drat but am unclear on whether I need
modify my existing repository in any way in order to make it work with drat.

Thanks.

Ari
On Thu, Mar 12, 2015 at 9:58 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
  
#
Ari,
On 12 March 2015 at 14:29, arilamstein at gmail.com wrote:
| I'm interested in pursing this but I haven't been able to figure how to to make
| it work.? Here's what I have so far:
| 
| install.packages("drat")
| library(drat)
| addRepo("arilamstein")
| 
| I (obviously) have a copy of the choroplethrZip github repo locally. I typed:
| 
| git checkout gh-pages
| git push

Drat makes one simple assumption (in the one / default argument case): that
the repo is called  'drat'  within gh repo of the given user. Ie using
addRepo("arilamstein")  requires that   ttps://github.com/arilamstin/drat/ 
exists and has a gh-pages branch. Drat would not know about  choroplethrZip

Which is why the docs say 'easiest to just fork the drat repo'. That give you
arilmastein/drat and gh-pages in one swoop.

In the expanded form you can give any (http or file) URL, that is was use at
work for files shared via the local network. 

Please try this, and if you need more follow-up we may want to move off-list
now.  You have a pretty good use case so I want to help you with this.

Dirk