Skip to content

[Bioc-devel] C++ code performance issues

4 messages · Peter Glaus, Martin Morgan

#
Hi,
I am working on BitSeq package, which has both command line C++ version 
and Bioconductor version in which R calls the same C++ code with .C 
function. While testing the development version of package on R 3.0.0 I 
noticed that the "R version" runs much slower: 2-3 TIMES slower than the 
pure C++ implementation.
Interestingly, the stable release of the "R version" seems to be as fast 
as C++ version. (The underlying code has changed slightly but there 
shouldn't be much difference)
Is there any reason for such behavior? Has anyone encountered similar 
issue? Is there a way to make the C++ code called from R faster?

More details:
I compiled the C++ code with same g++ flags (... -O3 -pipe -fpic -g... ) 
and removed OpenMP support from both.
The functions take exactly the same input (input is read from a file), 
and produce exactly same output (using same seed). A specific 
computation that took the C++ version 12minutes, took the R(C++) version 
47minutes. There is no IO during that part of the code and there was 
just one R_CheckUserInterrupt() call during this time (I changed the 
code, so that there would not be many of these calls.).
There are just few differences in the last stable release and that seems 
to run even faster than current C++ (10m). (The stable release uses -O2 
while compiling the c++ code.)

Thanks,
Peter.
#
On 03/21/2013 11:30 AM, Peter Glaus wrote:
Can you narrow this down to something more reproducible, e.g., a particular call 
that causes problems, including the platform(s) on which you are seeing issues?

Maybe you're running out of memory (because R is holding memory that the command 
line does not access)?

Probably you spend most of your time 'in C' or 'in R', rather than moving 
between them?

You could try, on linux / mac, a cheap C-level guesstimate of where time is 
spent by running under gdb

   R -d gdb
   (gdb) run

and then periodically breaking with cntrl-C and looking where you are

   (gdb) backtrace
    ## stack trace
    (gdb) continue

and comparing the same under the commandline

   > gdb ./bitseq

or doing some more serious profiling as outlines in section 3.4 of 'Writing R 
Extensions"; probably you would start by getting a short reproducible example.

Martin

  
    
#
Hi Martin,
thanks for the tips. I did a bit more investigation and it showed up 
that the development version of R is not compiling with optimization 
flags while installing the packages.
I am not sure whether this was also the case initially, but I know for 
sure that it was using -O3 when running CMD check, maybe I just got 
confused and never noticed that it's not using it during the installation.

Is it safe to assume that optimization flags will be used in the stable 
release version, or is it better to specify the in the package's Makevars?

Peter.
On 21/03/13 19:36, Martin Morgan wrote:
#
On 03/22/2013 08:11 AM, Peter Glaus wrote:
a 2- or 3x speed-up due to compiler flags would be surprisingly (to me) large; 
maybe more likely a few percent...

By default R uses the same compiler flags for package installation as were used 
to build R itself; perhaps your development version of R has been compiled with 
CXXFLAGS="-O0"; I believe that the Bioc builders use 'default' values for these, 
and that these remain unchanged in the R distribution at -O2; this could be 
platform (Linux / Mac / Windows) or compiler-specific, though. Probably the 
intention is that R would be compiled to use -O2 'out of the box'. These can be 
checked at

   http://bioconductor.org/checkResults/devel/bioc-LATEST/

by clicking on the different machine names, george2, moscato2, petty

Martin