Skip to content

Building R under Linux - library dependencies

7 messages · Paweł Piątkowski, elijah wright, Dirk Eddelbuettel

#
Hello and apologies if this doesn't belong here.

I'm trying to build a "portable" version of R - "portable" means that it could be easily moved to another location or machine simply by copying it. However, I encountered a problem when running it elsewhere: it seems that versions of dynamic libraries used by R are fixed and set at the build time; when that instance of R is run on a system with a different version of certain library (e.g. libicuuc.so.52 instead of libicuuc.so.48), it can't find it and quits.
Is there a way to overcome this problem? Precompiled versions of R can be installed on various system configurations, so I guess that there should be a way to compile it in a version-agnostic manner.

Best regards,
Pawel
#
On 7 September 2016 at 17:27, Pawe? Pi?tkowski wrote:
| Hello and apologies if this doesn't belong here.
| 
| I'm trying to build a "portable" version of R - "portable" means that it could be easily moved to another location or machine simply by copying it. However, I encountered a problem when running it elsewhere: it seems that versions of dynamic libraries used by R are fixed and set at the build time; when that instance of R is run on a system with a different version of certain library (e.g. libicuuc.so.52 instead of libicuuc.so.48), it can't find it and quits.
| 
| > bin/exec/R: error while loading shared libraries: libicuuc.so.48: cannot open shared object file: No such file or directory
| 
| Is there a way to overcome this problem? Precompiled versions of R can be installed on various system configurations, so I guess that there should be a way to compile it in a version-agnostic manner.

Yes, for example by

  -- using a Docker container which is portable across OSs (!!) and versions

  -- relying on package management which is what every Linux distro does

Otherwise you are trying to reinvent a systems-level wheel in application
space. I suspect that won't end well.

Dirk


PS For the latter point, our .deb based R package currently shows this:

Package: r-base-core
Source: r-base
Priority: optional
Section: gnu-r
Installed-Size: 33845
Maintainer: Dirk Eddelbuettel <edd at debian.org>
Architecture: amd64
Version: 3.3.1-1.xenial.0
Recommends: r-recommended, r-base-dev, r-doc-html
Replaces: r-base (<= 1.4.1-1), r-base-latex (<= 2.9.2-4), r-cran-rcompgen (<= 0.1-17-1), r-gnome (<= 2.3.1), r-recommended (<< 1.9.0)
Suggests: ess, r-doc-info | r-doc-pdf, r-mathlib, r-base-html
Provides: r-api-3, r-base-latex, r-cran-rcompgen, r-gnome
Depends: zip, unzip, libpaper-utils, xdg-utils, libblas3 | libblas.so.3, libbz2-1.0, libc6 (>= 2.23), libcairo2 (>= 1.6.0), libcurl3 (>= 7.28.0), libglib2.0-0 (>= 2.12.0), libgomp1 (>= 4.9), libjpeg8 (>= 8c), liblapack3 | liblapack.so.3, liblzma5 (>= 5.1.1alpha+20120614), libpango-1.0-0 (>= 1.14.0), libpangocairo-1.0-0 (>= 1.14.0), libpcre3, libpng12-0 (>= 1.2.13-4), libreadline6 (>= 6.0), libtcl8.6 (>= 8.6.0), libtiff5 (>= 4.0.3), libtk8.6 (>= 8.6.0), libx11-6, libxt6, zlib1g (>= 1:1.1.4), ucf (>= 3.0), ca-certificates
Conflicts: r-base-latex, r-cran-rcompgen, r-gnome
Filename: pool/main/r/r-base/r-base-core_3.3.1-1.xenial.0_amd64.deb
Size: 20939808
MD5sum: a983ccafe969cc4d8a631036478ac1c2
SHA1: 1fd9991b2577bf18074cd0a7e8017a98d4efef13
SHA256: 9a0cc3d5edf6b628d854a075731f30b460ea9ff465327693eb2d92b59ac01901
Description-en: GNU R core of statistical computation and graphics system
[...]

Note the detailed and fine-grained breakdown of library dependencies.
#
Docker R containers are north of 250 MB. I have checked experimentally that you can trim R down to 16 MB (!) and you'll still be able to execute it (though with warnings). That *is* quite a difference, especially when deploying small applications.
Sure, package dependencies would be great as well - at least you'd be sure that users of, say, Debian-based distros will be able to run this portable R, as long as they've installed the required libraries. But notice that in your example package versions equal *or greater* than listed are required - so if someone has upgraded their system, they still will be able to run that R. With a version built from source you need *exactly* the same version as on the machine where R was compiled. Hence my question: how come the precompiled distribution of R has "less strict" library requirements than manually compiled versions?

Best,
Pawel
#
On Wed, Sep 7, 2016 at 1:50 PM, Pawe? Pi?tkowski <cosi1 at tlen.pl> wrote:

            
... I would guesstimate the libraries required to run R with any useful set
of libraries is quite a bit bigger than the cited 16M .......
Package managers don't usually cite 'less than' versions for packages -
because how do you assert a version that won't work when it hasn't been
released yet?

You could go on a tear and build statically linked versions of
R-with-everything-you-need, and maybe avoid the library madness... but this
is sort of a fool's errand and a huge consumer of time.  OS vendors and
compiler developers have stopped doing things that way for reasons.... it's
much simpler to reduce duplication and make everything work - while
allowing for patching out security issues - when you are *just slightly*
more flexible.

ABI compatibility and library versioning are, I think, fairly well
understood....

Doing this stuff with a container is very much the easiest route, if you
actually want it to be completely portable.  You're certainly welcome to
start with an Alpine Linux base and add R on top and then start paring...
but I start to not understand the point, somewhere in there....  it's a lot
of time spent on something that doesn't seem that beneficial when you've
got (even fairly reasonably modern) hardware that can deal with a tiny bit
of extra bloat.  SD cards and USB sticks are pretty cheap everywhere, now,
aren't they?

I could say, maybe, putting time into this as some kind of retrocomputing
project... but probably not otherwise.

best,

--e
#
?
?
Maybe. The minimal usable subset is about 37 MB, add a few custom libraries, code of your application etc... But it's *still* much less than 250 MB.
?
?
I meant that manually built versions of R (at least those compiled by me) are fixed at a certain version of dynamic libraries - the same as installed on the machine R was compiled on. You can't run this compiled R on an upgraded configuration.
?
Why link the libraries statically? Most Linux distributions make symlinks to dynamically linked libraries - so you have for example libicuuc.so that links to libicuuc.so.XX (where XX is the version number). Why not rely on these generic names?
?
Potential users who would have to download 250 megabytes beg to differ ;-)

Best,
-p-
#
On 7 September 2016 at 20:50, Pawe? Pi?tkowski wrote:
| > | Is there a way to overcome this problem? Precompiled versions of R can be installed on various system configurations, so I guess that there should be a way to compile it in a version-agnostic manner.
| > 
| > Yes, for example by
| > 
| >   -- using a Docker container which is portable across OSs (!!) and versions
| 
| Docker R containers are north of 250 MB. I have checked experimentally that you can trim R down to 16 MB (!) and you'll still be able to execute it (though with warnings). That *is* quite a difference, especially when deploying small applications.

You are not enumerating your trade-offs very well. There are natural
conflicts. What is you really want?

- Being able to pre-build and distribute?  We have done that since the last
5C1990s with .deb packages.

- Being able to install with minimal size?  Have you queried your users?  I
note that among the Docker containers for R (in the "Rocker" project Carl and
I run) the _larger_ ones containing RStudio plus optionally "lots from
hadley" plus optionally lots of rOpenSci tend to me _more_ popular (for ease
of installation of the aggregate).

And while share the overall sentiment a little bit, you have to realize that
it is 2016 with the corresponding bandwith and storage:

  edd at max:~$ du -csh /usr/local/lib/R/site-library/
  1.5G    /usr/local/lib/R/site-library/
  1.5G    total
  edd at max:~$

And that it _outside_ of R itself, or the (numerous) other shared libraries.

| >   -- relying on package management which is what every Linux distro does
| > 
| > (...)
| > 
| > PS For the latter point, our .deb based R package currently shows this:
| > 
| > (...)
| > 
| > Depends: zip, unzip, libpaper-utils, xdg-utils, libblas3 | libblas.so.3, libbz2-1.0, libc6 (>= 2.23), libcairo2 (>= 1.6.0), libcurl3 (>= 7.28.0), libglib2.0-0 (>= 2.12.0), libgomp1 (>= 4.9), libjpeg8 (>= 8c), liblapack3 | liblapack.so.3, liblzma5 (>= 5.1.1alpha+20120614), libpango-1.0-0 (>= 1.14.0), libpangocairo-1.0-0 (>= 1.14.0), libpcre3, libpng12-0 (>= 1.2.13-4), libreadline6 (>= 6.0), libtcl8.6 (>= 8.6.0), libtiff5 (>= 4.0.3), libtk8.6 (>= 8.6.0), libx11-6, libxt6, zlib1g (>= 1:1.1.4), ucf (>= 3.0), ca-certificates
| 
| Sure, package dependencies would be great as well - at least you'd be sure that users of, say, Debian-based distros will be able to run this portable R, as long as they've installed the required libraries. But notice that in your example package versions equal *or greater* than listed are required - so if someone has upgraded their system, they still will be able to run that R. With a version built from source you need *exactly* the same version as on the machine where R was compiled. Hence my question: how come the precompiled distribution of R has "less strict" library requirements than manually compiled versions?

This is not the list for internals of how Linux packaging works, but if you
took the question to debian-user or debian-devel you would like get a pretty
qualified answer.  That dependency resolution system has been refined for
well over 20 years, so don't expect one sentence answers.

Good luck,  Dirk
#
OK, to be honest, it was rather a proof-of-concept than a specific idea. Other interpreted and VM-based languages have robust app deployment systems with smaller footprint, so I thought that it would be nice to have something similar in R.
Maybe you are right and neither R developers, nor users actually need it.

Thanks for the discussion,
-p-