Skip to content
Prev 55876 / 63421 Next

segfault issue with parallel::mclapply and download.file() on Mac OS X

This code actually happens to work for me on macOS, but I think in
general you cannot rely on performing HTTP requests in fork clusters,
i.e. with mclapply().

Fork clusters create worker processes by forking the R process and
then _not_ executing another R binary. (Which is often convenient,
because the new processes will inherit the memory image of the parent
process.)

Fork without exec is not supported by macOS, basically any calls to
system libraries might crash. (Ie. not just HTTP-related calls.) For
HTTP calls I have seen errors, crashes, and sometimes it works.
Depends on the combination of libcurl version, macOS version and
probably luck.

It usually (always?) works on Linux, but I would not rely on that, either.

So, yes, this is a known issue.

Creating new processes to perform HTTP in parallel is very often bad
practice, actually. Whenever you can, use I/O multiplexing instead,
since the main R process is not doing anything, anyway, just waiting
for the data to come in. So you don't need more processes, you need
parallel I/O. Take a look at the curl::multi_add() etc. functions.

Btw. download.file() can actually download files in parallel if the
liburl method is used, just give it a list of URLs in a character
vector. This API is very restricted, though, so I suggest to look at
the curl package.

GaborOn Thu, Sep 20, 2018 at 8:44 AM Seth Russell
<seth.russell at gmail.com> wrote: