Skip to content

[Rcpp-devel] Exporting rcpp-based function into parLapply workers in an R package

12 messages · R. Michael Weylandt, Jeff Newmiller, Naeem Khoshnevis +2 more

#
Dear Rcpp developers:

Thanks for developing and maintaining the Rcpp package.
I wrote a function in Rcpp. It is available throughout the package and
works as expected; however, it is not available for praLapply workers. A
temporary fix is just using Rcpp::cppFunction inside the function that
parLapply workers call and copy the entire function over there. However,
this does not seem right for bigger and more complicated functions.
I would be grateful if you could let me know whether there is a better
long-term solution. Here is the package and three functions that you might
want to take a look at.

Original cpp function:
https://github.com/fasrc/CausalGPS/blob/master/src/compute_closest_wgps_helper.cpp

Wrapper function that calls this function + temporal fix:
https://github.com/fasrc/CausalGPS/blob/master/R/compute_closest_wgps.R

The function that uses parLapply (please see line 63-89) to run the c++
code:
https://github.com/fasrc/CausalGPS/blob/master/R/compute_closest_wgps.R

Best regards,
Naeem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210514/c5d5dc6e/attachment.html>
#
Hi Naeem,

My (very quick) guess is that this isn't an Rcpp problem per se, but a
parLapply problem. You need to explicitly load your package on each
worker so that functions from it are available.

See, e.g., the brief discussion here:
https://stackoverflow.com/questions/18357788/parallel-parlapply-setup#18358875

The "parallel" packages do not exactly replicate your environment on
each worker node (to avoid expensive set-up / communication costs) so
you need to do a bit more set-up.

Best,
Michael

On Fri, May 14, 2021 at 11:49 AM Naeem Khoshnevis
<khoshnevis.naeem at gmail.com> wrote:
#
Hi Michael,


Thank you so much for your response. That is correct. One method for
exporting required variables/functions is using the clusterExport function,
 which does not work for Rcpp-based functions. Another option is using
clusterEvalQ (as mentioned in the shared post); however, I am not sure if
CRAN likes to see the library(package name) inside the codebase. What are
your thoughts?

Best regards,
Naeem

On Fri, May 14, 2021 at 11:57 AM Michael Weylandt <
michael.weylandt at gmail.com> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210514/e51b5e2d/attachment.html>
#
clusterExport works just fine if you put your Rcpp code into your own package and make that package available in your worker environment. Given the need for compilation in possibly a variety of computing environments for parallel processing this is definitely recommended.
On May 14, 2021 10:35:25 AM PDT, Naeem Khoshnevis <khoshnevis.naeem at gmail.com> wrote:

  
    
#
Thank you so much, Jeff. The part that I do not understand is the "and make
that package available in your worker environment" part. Could you please
let me know how I can make the package available for each worker.

On Fri, May 14, 2021 at 1:44 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210514/c9ddba15/attachment.html>
#
You "just" run clusterEvalQ(library(PKG)) on each cluster. Like Jeff
says, this is much easier for you as a package developer and, I'd
argue, easier for your users as well, since they just have to make
sure the package can be installed once, rather than having compilers
ready and behaving for each use.

The CRAN organization / mirror on GitHub (github.com/cran) is very
useful for this sort of thing.

Searching for "library(" and "clusterEvalQ"
(https://github.com/search?q=org%3Acran+library%28+clusterEvalQ&type=code)
in that organization yields the following result (chosen at random):

https://github.com/cran/textmineR/blob/889b400b2ccdc4eac7b9fee5dd7678bd71f0b290/R/other_utilities.R#L51

where you can see how the textmineR package loads itself on each worker.

To your earlier question, I *think* CRAN is ok to
"clusterEvalQ(library(PKG))" within one of your functions (as
evidenced by the search of CRAN above), but I've never done it myself,
so can't confirm.

Michael

On Fri, May 14, 2021 at 1:57 PM Naeem Khoshnevis
<khoshnevis.naeem at gmail.com> wrote:
#
Thank you so much, Michael and Jeff. I really appreciate your help. These
are invaluable suggestions and recommendations.  Thanks, Dirk, for bringing
this email list to my attention.

Best regards,
Naeem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210514/682e9b81/attachment.html>
#
This is great, James. Thank you so much for sharing.

Best regards,
Naeem

On Fri, May 14, 2021 at 3:09 PM Balamuta, James Joseph <
balamut2 at illinois.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210514/c5032110/attachment.html>
#
On 14 May 2021 at 14:07, Michael Weylandt wrote:
| The CRAN organization / mirror on GitHub (github.com/cran) is very
| useful for this sort of thing.
| 
| Searching for "library(" and "clusterEvalQ"
| (https://github.com/search?q=org%3Acran+library%28+clusterEvalQ&type=code)
| in that organization yields the following result (chosen at random):

Yes! I actually do these type of searches all the time myself (and am old
enough to bemoan the disappearance the Google code search tool that preceded
it ages ago).

Dirk
#
For code searches, consider using the {searcher} package: https://github.com/r-assist/searcher 

In particular, the search_github() function handles the query formatting. As an example, try:

searcher::search_github("clusterEvalQ ")

This opens a web browser with:

https://github.com/search?q=clusterEvalQ%20%20language:r%20type:issue&type=Issues

Lastly, there is an R-specific Google search engine available at: https://rseek.org/ It's not quite google code search, but it's useful! Plus, there is a {searcher} function for that as well, e.g. searcher::search_rseek().
(Thanks Alex Rossell Hayes for that contribution.)

Best,

JJB

On 5/14/21, 3:49 PM, "Rcpp-devel on behalf of Dirk Eddelbuettel" <rcpp-devel-bounces at lists.r-forge.r-project.org on behalf of edd at debian.org> wrote:
On 14 May 2021 at 14:07, Michael Weylandt wrote:
| The CRAN organization / mirror on GitHub (github.com/cran) is very
    | useful for this sort of thing.
    | 
    | Searching for "library(" and "clusterEvalQ"
    | (https://urldefense.com/v3/__https://github.com/search?q=org*3Acran*library*28*clusterEvalQ&type=code__;JSslKw!!DZ3fjg!uSCS0rJpO5S9EvzzjplvK1kTsvK9ju6pokUJjHxfDgCr2J7oJFfAnTbVKpfD7RInvQA$ )
    | in that organization yields the following result (chosen at random):

    Yes! I actually do these type of searches all the time myself (and am old
    enough to bemoan the disappearance the Google code search tool that preceded
    it ages ago).

    Dirk

    -- 
    https://urldefense.com/v3/__https://dirk.eddelbuettel.com__;!!DZ3fjg!uSCS0rJpO5S9EvzzjplvK1kTsvK9ju6pokUJjHxfDgCr2J7oJFfAnTbVKpfDGavZ2gk$  | @eddelbuettel | edd at debian.org
    _______________________________________________
    Rcpp-devel mailing list
    Rcpp-devel at lists.r-forge.r-project.org
    https://urldefense.com/v3/__https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel__;!!DZ3fjg!uSCS0rJpO5S9EvzzjplvK1kTsvK9ju6pokUJjHxfDgCr2J7oJFfAnTbVKpfDlciWISw$
#
This is getting off topic but as you James saw fit to advertise his package
(as he should, it is clearly helpful to some, himself included), here are my
$0.02 of why it is not for me:
On 14 May 2021 at 20:54, Balamuta, James Joseph wrote:
| For code searches, consider using the {searcher} package: https://github.com/r-assist/searcher 
| 
| In particular, the search_github() function handles the query formatting. As an example, try:

I keep the R prompt(s) (in Emacs, generally) to data work, and do more work
like this on the shell. Where this is less useful (though I sometimes wrap R
commands in littler script). Here I particularly dislike
| 
| searcher::search_github("clusterEvalQ ")
| 
| This opens a web browser with:
| 
| https://github.com/search?q=clusterEvalQ%20%20language:r%20type:issue&type=Issues

the shell-to-browser pivot. I have some "permanent tabs" dedicated to GH, I
prefer to search therein.  Also, why default to issues when the query didn't
have it? Anyway ...
 
| Lastly, there is an R-specific Google search engine available at: https://rseek.org/ It's not quite google code search, but it's useful! Plus, there is a {searcher} function for that as well, e.g. searcher::search_rseek().
| (Thanks Alex Rossell Hayes for that contribution.)

Yes that's been around for a while, but codesearch.google.com was still
better and is missed.  One of the other code aggregators that (AFAIK is also
gone now) had a different search engine too.

This whole thread is highly off-tocpic and likely answered by some SO answers
as has bee pointed out as well as possibly some discussions in the r-sig-hpc
list (which is mostly dormant these days).

Dirk