Skip to content

50% performance of custom R build compared to PPA R for a command

5 messages · Scott Kostyshak, Dirk Eddelbuettel

#
Hi,

I have R installed from the Ubuntu PPA and a local build of R (more
details below). I will refer to these as "R" and "R-devel",
respectively. I've reproduced the following on Ubuntu 13.10 and 14.04.
Below is an example (which requires the bootstrap package) that takes
10 seconds for me to run with R-devel and 5 seconds with R

library(bootstrap)
str(tooth)
theta <- function(ind) {
    easy <- lm(strength ~ E1+E2, data=tooth, subset=ind)
    diffi<- lm(strength ~ D1+D2, data=tooth, subset=ind)
    (sum(resid(easy)^2) - sum(resid(diffi)^2))/13   }
tooth.boot <- bootstrap(1:13, 2000, theta)

I'm wondering if this is due to different compiler flags. For R, when
installing the bootstrap package, I see
gcc -std=gnu99 -shared -Wl,-Bsymbolic-functions -Wl,-z,relro -o
bootstrap.so boott.o -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR

For R-devel I see:
ccache gcc -shared -L/usr/local/lib -o bootstrap.so boott.o -lgfortran
-lm -lquadmath -L/usr/local/lib/R-devel/lib/R/lib -lR

My install script for the local build is based on Dirk's script [1].
In particular, my configure command is:

R_PAPERSIZE=letter R_BATCHSAVE="--no-save --no-restore"
R_BROWSER=xdg-open PAGER=/usr/bin/pager PERL=/usr/bin/perl
R_UNZIPCMD=/usr/bin/unzip R_ZIPCMD=/usr/bin/zip
R_PRINTCMD=/usr/bin/lpr LIBnn=lib AWK=/usr/bin/awk CC="gcc"
CFLAGS="-ggdb -pipe -std=gnu99 -Wall -pedantic" CXX="g++"
CXXFLAGS="-ggdb -pipe -Wall -pedantic" FC="gfortran" F77="gfortran"
MAKE="make -j$NJOBS" "${repoDir}/configure"
--prefix=/usr/local/lib/R-devel --enable-R-shlib --with-blas
--with-lapack --with-readline --without-recommended-packages >
../build-logs/configure 2>&1

I'm using R-devel updated to today's revision but I compiled a version
from a year ago and had the same performance so that is why I suspect
my installation script accounts for the differences.

Any advice would be appreciated and please let me know if any other
information would be helpful.

Best,

Scott

[1]
http://www.personal.psu.edu/mar36/blogs/the_ubuntu_r_blog/2012/08/installing-the-development-version-of-r-on-ubuntu-alongside-the-current-version-of-r.html

--
Scott Kostyshak
Economics PhD Candidate
Princeton University
#
Scott,

My first quick hunches are a) 50% is too much for compiler switches, b) your
examples shows R code, and c) are you sure you are using the same BLAS?

What happens when you profile?

Dirk
#
On Thu, Apr 24, 2014 at 4:32 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
Thanks for the quick reply Dirk and for the suggestions.

As for BLAS, yes I believe I'm using the same BLAS. The output of the
following two commands is the same (except for the memory addresses of
course):
$ ldd /usr/local/lib/R-devel/lib/R/bin/exec/R
$ ldd /usr/lib/R/bin/exec/R

And executing
$ lsof -p <PID> | grep 'blas\|lapack'
also returns the same output for both Rs:
R       13017 scott  mem    REG    8,1  9142768 2097161
/usr/lib/atlas-base/atlas/liblapack.so.3.0
R       13017 scott  mem    REG    8,1  3776592 2097162
/usr/lib/atlas-base/atlas/libblas.so.3.0

I profiled and it seems that all of the R functions are slow (I can
post the output if anyone is interested). I rebuilt with -O3 in CFLAGS
and this improved things a lot. Time went down from 10 seconds to 5.7
or so. I reprofiled and again the R functions of R-devel seem just a
tad slower across the board (I can send output if interested).

Below are some timings comparing the optimized R-devel to R.

$ time R-devel CMD BATCH mwe.R

real 0m5.755s
user 0m5.678s
sys 0m0.079s

$ time R CMD BATCH mwe.R

real 0m5.453s
user 0m5.371s
sys 0m0.054s

Rerunning the above commands multiple times gives about the same output.

There's still a .3 second difference and I'm curious to know why. Any ideas?

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University
#
On 25 April 2014 at 11:38, Scott Kostyshak wrote:
| On Thu, Apr 24, 2014 at 4:32 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| >
| > Scott,
| >
| > My first quick hunches are a) 50% is too much for compiler switches, b) your
| > examples shows R code, and c) are you sure you are using the same BLAS?
| 
| Thanks for the quick reply Dirk and for the suggestions.
| 
| As for BLAS, yes I believe I'm using the same BLAS. The output of the
| following two commands is the same (except for the memory addresses of
| course):
| $ ldd /usr/local/lib/R-devel/lib/R/bin/exec/R
| $ ldd /usr/lib/R/bin/exec/R
| 
| And executing
| $ lsof -p <PID> | grep 'blas\|lapack'
| also returns the same output for both Rs:
| R       13017 scott  mem    REG    8,1  9142768 2097161
| /usr/lib/atlas-base/atlas/liblapack.so.3.0
| R       13017 scott  mem    REG    8,1  3776592 2097162
| /usr/lib/atlas-base/atlas/libblas.so.3.0
| 
| I profiled and it seems that all of the R functions are slow (I can
| post the output if anyone is interested). I rebuilt with -O3 in CFLAGS
| and this improved things a lot. Time went down from 10 seconds to 5.7

That is surprisingly large. In my mail yesterday I basically bet against it.

| or so. I reprofiled and again the R functions of R-devel seem just a
| tad slower across the board (I can send output if interested).
| 
| Below are some timings comparing the optimized R-devel to R.
| 
| $ time R-devel CMD BATCH mwe.R
| 
| real 0m5.755s
| user 0m5.678s
| sys 0m0.079s
| 
| $ time R CMD BATCH mwe.R
| 
| real 0m5.453s
| user 0m5.371s
| sys 0m0.054s
| 
| Rerunning the above commands multiple times gives about the same output.
| 
| There's still a .3 second difference and I'm curious to know why. Any ideas?

Different code base?

If you want _identical_ outcomes you need identical _input_: code, compiler,
settings, hardware, ...

Dirk
#
On Fri, Apr 25, 2014 at 11:59 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
Not looking for identical, just looking to squeeze out something to
learn from about other possibilities for differences, e.g. libraries
that I'm not linking against at compile time, or differences with byte
compiling R. But it doesn't seem like there's any obvious candidates
so I'll stop here for now.

Thanks for the help, Dirk.

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University