Skip to content

Big speedup in install.packages() by re-using connections

2 messages · Jeroen Ooms, Ivan Krylov

#
I'd like to raise this again now that 4.4 is out.

Below is a more complete patch which includes a function to properly
cleanup libcurl when R quits. Implementing this is a little tricky
because libcurl is a separate "module" in R, perhaps there is a better
way, but this works:

  view: https://github.com/r-devel/r-svn/pull/166/files
  patch: https://github.com/r-devel/r-svn/pull/166.diff

The old patch is still there as well, which is meant a minimal proof
of concept to test the performance gains for reusing the connection:

  view: https://github.com/r-devel/r-svn/pull/155/files
  patch: https://github.com/r-devel/r-svn/pull/155.diff

Performance gains are greatest on high-bandwidth servers when
downloading many files from the same server (e.g. packages from a cran
mirror). In such cases, currently over 90% of the total time is spent
on initiating and tearing town a separate SSL connection for each file
download.

Thoughts?
On Sat, Mar 2, 2024 at 3:07?PM Jeroen Ooms <jeroenooms at gmail.com> wrote:
#
On Thu, 25 Apr 2024 14:45:04 +0200
Jeroen Ooms <jeroenooms at gmail.com> wrote:

            
How verboten would it be to create an empty external pointer object,
add it to the preserved list, and set an on-exit finalizer to clean up
the curl multi-handle? As far as I can tell, the internet module is not
supposed to be unloaded, so this would not introduce an opportunity to
jump to an unmapped address. This makes it possible to avoid adding a
CurlCleanup() function to the internet module:

Index: src/modules/internet/libcurl.c
===================================================================
--- src/modules/internet/libcurl.c	(revision 86484)
+++ src/modules/internet/libcurl.c	(working copy)
@@ -55,6 +55,47 @@
 
 static int current_timeout = 0;
 
+// The multi-handle is shared between downloads for reusing connections
+static CURLM *shared_mhnd = NULL;
+static SEXP mhnd_sentinel = NULL;
+
+static void cleanup_mhnd(SEXP ignored)
+{
+    if(shared_mhnd){
+        curl_multi_cleanup(shared_mhnd);
+        shared_mhnd = NULL;
+    }
+    curl_global_cleanup();
+}
+static void rollback_mhnd_sentinel(void* sentinel) {
+    // Failed to allocate memory while registering a finalizer,
+    // therefore must release the object
+    R_ReleaseObject((SEXP)sentinel);
+}
+static CURLM *get_mhnd(void)
+{
+    if (!mhnd_sentinel) {
+      SEXP sentinel = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue));
+      R_PreserveObject(sentinel);
+      UNPROTECT(1);
+      // Avoid leaking the sentinel before setting the finalizer
+      RCNTXT cntxt;
+      begincontext(&cntxt, CTXT_CCODE, R_NilValue, R_BaseEnv, R_BaseEnv,
+                   R_NilValue, R_NilValue);
+      cntxt.cend = &rollback_mhnd_sentinel;
+      cntxt.cenddata = sentinel;
+      R_RegisterCFinalizerEx(sentinel, cleanup_mhnd, TRUE);
+      // Succeeded, no need to clean up if endcontext() fails allocation
+      mhnd_sentinel = sentinel;
+      cntxt.cend = NULL;
+      endcontext(&cntxt);
+    }
+    if(!shared_mhnd) {
+      shared_mhnd = curl_multi_init();
+    }
+    return shared_mhnd;
+}
+
 # if LIBCURL_VERSION_MAJOR < 7 || (LIBCURL_VERSION_MAJOR == 7 && LIBCURL_VERSION_MINOR < 28)
 
 // curl/curl.h includes <sys/select.h> and headers it requires.
@@ -565,8 +606,6 @@
 	if (c->hnd && c->hnd[i])
 	    curl_easy_cleanup(c->hnd[i]);
     }
-    if (c->mhnd)
-	curl_multi_cleanup(c->mhnd);
     if (c->headers)
 	curl_slist_free_all(c->headers);
 
@@ -668,7 +707,7 @@
 	c.headers = headers = tmp;
     }
 
-    CURLM *mhnd = curl_multi_init();
+    CURLM *mhnd = get_mhnd();
     if (!mhnd)
 	error(_("could not create curl handle"));
     c.mhnd = mhnd;