request: allow inline functions in R - R-devel

Li Long

Fri, May 14, 2004 4:22 PM #

Hi, R core developers,

I work in the Swiss Institute of Bioinformatics.  We have two clusters of
Intel Itanium2 clusters for bioinformaticians to crank their data.  One
piece of software they use heavily is R and BioConductors.  I ported the R
codes and R packages to this platform already, and am working on
optimizing their performance.  I'm using Intel C/C++ compiler on this
platform running Linux.  One of my findings is that turning some functions
in R to "inline" functions boost performance significantly.

While R follows strict C89 standard right now, there're quite some good
reasons to relax the rule somewhat.  From my experience in software
development in industry, I understand very well both the portability issue
and backward compatability issue, I also see the hidden cost of holding
back for too long and not fully achieving the potential of new technology,
I recommend that we allow "inline" functions in R's C codes.

The following explains why I recommend the above.

In modern processor microarchitecture, pipelining is a major approach to
achieve higher clock speed.  Super-pipelining involves pipelining the
microarchitecture to finer granularities.  With far more instructions
in-flight in a super-pipelined microarchitecture, handling of events that
disrupt the pipeling, such as cache mises, interrupts and branch
misprediction, can be costly.

A case in point is the Intel Itanium architecture, EPIC (explicitly
parallel intruction computing).  EPIC enables programmer or compiler to
indicate the inherent parallelism of programs *explicitly* in the
instruction sequence.  The main features to improve performance are:
application registers, predication, branching and register rotation.  The
implication is the cost of disrupting the pipeline is magnified greatly on
this architecture.

In R, there are quite some simple functions that are called extremely
often, such as "R_IsNaNorNA", "R_finite", etc.  They are used in heavy
loops quite a lot.  They disrupt the pipelining, and negatively affect the
performance of the software.  For instance, on IA64, system call of
"isnan" cost 4 cycles, while a wrapper like "R_IsNaNorNA" could cost
several times more.

One way to reduce this kind of disruption in C/C++ is to "inline" a
function, i.e., to integrate it into the code for its callers, eliminating
the function-call overhead.
The benefits from inlining comes especially with very short functions.

On unix and linux, we could find inline functions in standard .h files in
/usr/include or /usr/local/include.

C++ supports "inline" functions from beginning, while "inline" keyword was
introduced in C99 standard in 1999.  A feature that has been in standard
for so many years is considered very mature in computer industry. Many C++
compilers actually translate C++ codes to C codes, so it's quite natural
for corresponding C compilers to support inline functions.  The compilers
could choose to generate function calls or to inline the functions, so
this feature poses little risk to the application.

The default compilers that R uses, gcc/g++, support it at least since
version 2.95 in Jul 1999.  GCC User's manual states that it "works" only
in optimizing compilation for "gcc/g++".

Since R calls for C, C++, FORTRAN compilers, it's no surprise to expect
that "inline" functions are allowed.  This will not only improve the
performance of R on modern processors with little effort, but also
encourage people to develop and use R packages on more challenging
problems.

In configure-step, R checks for many OS/compiler-related issues, this
could be just one more check.  I expect that the initial use of inline
functions are mainly for small but heavily used functions, so the impact
of such change could be managed.

The attachments are from GCC User's Manual and C99 rationale, regarding
"inline" functions.

Thanks for considering this issue.

Li Long


-------------- next part --------------

==================

6.4.1 Keywords
Several keywords were added in C89: const, enum, signed, void and volatile. New in
C9X are the keywords inline, restrict, _Bool, _Complex and _Imaginary.
Where possible, however, new features have been added by overloading existing keywords, as, for
example, long double instead of extended. It is recognized that each added keyword will
require some existing code that used it as an identifier to be rewritten. No meaningful programs are
known to be quietly changed by adding the new keywords.



6.7.4     Function specifiers
A new feature of C99: The inline keyword, adapted from C++, is a function-specifier that
can be used only in function declarations. It is useful for program optimizations that require the
definition of a function to be visible at the site of a call. (Note that the Standard does not
attempt to specify the nature of these optimizations.)
Visibility is assured if the function has internal linkage, or if it has external linkage and the call
is in the same translation unit as the external definition. In these cases, the presence of the
inline keyword in a declaration or definition of the function has no effect beyond indicating a
preference that calls of that function should be optimized in preference to calls of other
functions declared without the inline keyword.
Visibility is a problem for a call of a function with external linkage where the call is in a
different translation unit from the function's definition. In this case, the inline keyword
allows the translation unit containing the call to also contain a local, or inline, definition of the
function.
A program can contain a translation unit with an external definition, a translation unit with an
inline definition, and a translation unit with a declaration but no definition for a function. Calls
in the latter translation unit will use the external definition as usual.
An inline definition of a function is considered to be a different definition than the external
definition. If a call to some function func with external linkage occurs where an inline
definition is visible, the behavior is the same as if the call were made to another function, say
__func, with internal linkage. A conforming program must not depend on which function is
called. This is the inline model in the Standard.
A conforming program must not rely on the implementation using the inline definition, nor may
it rely on the implementation using the external definition. The address of a function is always
the address corresponding to the external definition, but when this address is used to call the
function, the inline definition might be used. Therefore, the following example might not
behave as expected.
       inline const char *saddr(void)
       {     static const char name[] = "saddr";
             return name;
       }
       int compare_name(void)
       {     return saddr() == saddr(); // unspecified behavior
       }
Since the implementation might use the inline definition for one of the calls to saddr and use
the external definition for the other, the equality operation is not guaranteed to evaluate to 1
(true). This shows that static objects defined within the inline definition are distinct from their

corresponding object in the external definition. This motivated the constraint against even
defining a non-const object of this type.
Inlining was added to the Standard in such a way that it can be implemented with existing linker
technology, and a subset of C99 inlining is compatible with C++. This was achieved by
requiring that exactly one translation unit containing the definition of an inline function be
specified as the one that provides the external definition for the function. Because that
specification consists simply of a declaration that either lacks the inline keyword, or contains
both inline and extern, it will also be accepted by a C++ translator.
Inlining in C99 does extend the C++ specification in two ways. First, if a function is declared
inline in one translation unit, it need not be declared inline in every other translation unit.
This allows, for example, a library function that is to be inlined within the library but available
only through an external definition elsewhere. The alternative of using a wrapper function for
the external function requires an additional name; and it may also adversely impact performance
if a translator does not actually do inline substitution.
Second, the requirement that all definitions of an inline function be "exactly the same" is
replaced by the requirement that the behavior of the program should not depend on whether a
call is implemented with a visible inline definition, or the external definition, of a function.
This allows an inline definition to be specialized for its use within a particular translation unit.
For example, the external definition of a library function might include some argument
validation that is not needed for calls made from other functions in the same library. These
extensions do offer some advantages; and programmers who are concerned about compatibility
can simply abide by the stricter C++ rules.
Note that it is not appropriate for implementations to provide inline definitions of standard
library functions in the standard headers because this can break some legacy code that
redeclares standard library functions after including their headers. The inline keyword is
intended only to provide users with a portable way to suggest inlining of functions. Because the
standard headers need not be portable, implementations have other options along the lines of:
       #define abs(x) __builtin_abs(x)
or other non-portable mechanisms for inlining standard library functions.
-------------- next part --------------

======================

 4.31 An Inline Function is As Fast As a Macro

By declaring a function inline, you can direct GNU CC to integrate that function's code into the code for its callers. This makes execution faster by eliminating the function-call overhead; in addition, if any of the actual argument values are constant, their known values may permit simplifications at compile time so that not all of the inline function's code needs to be included. The effect on code size is less predictable; object code may be larger or smaller with function inlining, depending on the particular case. Inlining of functions is an optimization and it really "works" only in optimizing compilation. If you don't use `-O', no function is really inline.

To declare a function inline, use the inline keyword in its declaration, like this:

 	

inline int
inc (int *a)
{
  (*a)++;
}

(If you are writing a header file to be included in ANSI C programs, write __inline__ instead of inline. See section 4.35 Alternate Keywords.) You can also make all "simple enough" functions inline with the option `-finline-functions'.

Note that certain usages in a function definition can make it unsuitable for inline substitution. Among these usages are: use of varargs, use of alloca, use of variable sized data types (see section 4.14 Arrays of Variable Length), use of computed goto (see section 4.3 Labels as Values), use of nonlocal goto, and nested functions (see section 4.4 Nested Functions). Using `-Winline' will warn when a function marked inline could not be substituted, and will give the reason for the failure.

Note that in C and Objective C, unlike C++, the inline keyword does not affect the linkage of the function.

GNU CC automatically inlines member functions defined within the class body of C++ programs even if they are not explicitly declared inline. (You can override this with `-fno-default-inline'; see section Options Controlling C++ Dialect.)

When a function is both inline and static, if all calls to the function are integrated into the caller, and the function's address is never used, then the function's own assembler code is never referenced. In this case, GNU CC does not actually output assembler code for the function, unless you specify the option `-fkeep-inline-functions'. Some calls cannot be integrated for various reasons (in particular, calls that precede the function's definition cannot be integrated, and neither can recursive calls within the definition). If there is a nonintegrated call, then the function is compiled to assembler code as usual. The function must also be compiled as usual if the program refers to its address, because that can't be inlined.

When an inline function is not static, then the compiler must assume that there may be calls from other source files; since a global symbol can be defined only once in any program, the function must not be defined in the other source files, so the calls therein cannot be integrated. Therefore, a non-static inline function is always compiled on its own in the usual fashion.

If you specify both inline and extern in the function definition, then the definition is used only for inlining. In no case is the function compiled on its own, not even if you refer to its address explicitly. Such an address becomes an external reference, as if you had only declared the function, and had not defined it.

This combination of inline and extern has almost the effect of a macro. The way to use it is to put a function definition in a header file with these keywords, and put another copy of the definition (lacking inline and extern) in a library file. The definition in the header file will cause most calls to the function to be inlined. If any uses of the function remain, they will refer to the single copy in the library.

GNU C does not inline any functions when not optimizing. It is not clear whether it is better to inline or not, in this case, but we found that a correct implementation when not optimizing was difficult. So we did the easy thing, and turned it off. 

 4.35 Alternate Keywords

The option `-traditional' disables certain keywords; `-ansi' disables certain others. This causes trouble when you want to use GNU C extensions, or ANSI C features, in a general-purpose header file that should be usable by all programs, including ANSI C programs and traditional ones. The keywords asm, typeof and inline cannot be used since they won't work in a program compiled with `-ansi', while the keywords const, volatile, signed, typeof and inline won't work in a program compiled with `-traditional'.

The way to solve these problems is to put `__' at the beginning and end of each problematical keyword. For example, use __asm__ instead of asm, __const__ instead of const, and __inline__ instead of inline.

Other C compilers won't accept these alternative keywords; if you want to compile with another compiler, you can define the alternate keywords as macros to replace them with the customary keywords. It looks like this:

 	

#ifndef __GNUC__
#define __asm__ asm
#endif

`-pedantic' causes warnings for many GNU C extensions. You can prevent such warnings within one expression by writing __extension__ before the expression. __extension__ has no effect aside from this.

elijah wright

Fri, May 14, 2004 4:46 PM #

if the Itanium2 optimizes badly for standard C89 code, that's the
processor architecture's fault, not R's.

you probably need either better (smarter) compilers or a different
platform, not 'improvements' to R.

what would the impact of inlining these functions be on all of the other
architectures (PPC, sparc, Opteron, x86, etc) where R currently runs
perfectly well?

--elijah

Brian Ripley

Fri, May 14, 2004 5:18 PM #

On Fri, 14 May 2004, Li Long wrote:

Could you then please quantify that hidden cost?

In what sense do `we' not allow it?  And who is `we'?

The problem is that very few compilers fully support C99, and others have
different ways to indicate inlining.  So a configure test is needed. I am
sure that if you provide one together with patches to parts of the code
where you find inlining beneficial, the real `we' would consider it
carefully.  Especially if the `hidden cost' is noticeable.

....

However, one of the motivations of eliminating support for non-IEEE-754
platforms in R 2.0.0 is to enable some of this baggage to be eliminated.  
But the wrapper is there for a good reason: to get the right answer.

Since I gather you have suitably modified code, it would be helpful to 
your case to provide data 

 - on real problems
 - on a mainstream platform.

of the actual performance impact of not inlining.

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Brian Ripley

Fri, May 14, 2004 5:57 PM #

BTW, I've just checked a hunch. I did not believe R_IsNaNorNA is actually
used in base R on a system with IEEE arithmetic (and I removed it to test
that).  That suggests you have not actually tested inlining it and your 
analysis is seriously flawed.

(Note to Deepayan: it is not a documented entry point, so please use 
ISNAN as documented.)

On Fri, 14 May 2004, Prof Brian Ripley wrote:

And BTW, R-devel is not the address of the core developers.

I work in the Swiss Institute of Bioinformatics.  We have two clusters of
Intel Itanium2 clusters for bioinformaticians to crank their data.  One
piece of software they use heavily is R and BioConductors.  I ported the R
codes and R packages to this platform already, and am working on
optimizing their performance.  I'm using Intel C/C++ compiler on this
platform running Linux.  One of my findings is that turning some functions
in R to "inline" functions boost performance significantly.

While R follows strict C89 standard right now, there're quite some good
reasons to relax the rule somewhat.  From my experience in software
development in industry, I understand very well both the portability issue
and backward compatability issue, I also see the hidden cost of holding
back for too long and not fully achieving the potential of new technology,

Could you then please quantify that hidden cost?

I recommend that we allow "inline" functions in R's C codes.

In what sense do `we' not allow it?  And who is `we'?

The problem is that very few compilers fully support C99, and others have
different ways to indicate inlining.  So a configure test is needed. I am
sure that if you provide one together with patches to parts of the code
where you find inlining beneficial, the real `we' would consider it
carefully.  Especially if the `hidden cost' is noticeable.

....

In R, there are quite some simple functions that are called extremely
often, such as "R_IsNaNorNA", "R_finite", etc.  They are used in heavy
loops quite a lot.  They disrupt the pipelining, and negatively affect the
performance of the software.  For instance, on IA64, system call of
"isnan" cost 4 cycles, while a wrapper like "R_IsNaNorNA" could cost
several times more.

However, one of the motivations of eliminating support for non-IEEE-754
platforms in R 2.0.0 is to enable some of this baggage to be eliminated.  
But the wrapper is there for a good reason: to get the right answer.

Since I gather you have suitably modified code, it would be helpful to 
your case to provide data 

 - on real problems
 - on a mainstream platform.

of the actual performance impact of not inlining.

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Li Long

Mon, May 17, 2004 8:16 AM #

R_IsNaNorNA is found in three dirs: main/arithmatic.c, nmath/mlutils.c,
nmath/nmath.h, include/R_ext/Arith.h and include/Rmath.h0.in.  (This is a
candidate for streamlining the codes. But that's a different subject. )

After I turned all of them inline, the model I have runs about 5% faster
on Itanium2 (running Linux).  When I cut down some calls in the package
(being developed) to ISNAN, the performance gain is another 5%. I
interprete this cost as quite significant.

In package "cluster", pam.c and clara.c contain a function "ind_2" which
is used very heavily in the loop. After changing this function to be an
"inline" function, the model speeds up by factor of 2.  That's very
significant improvement for the model.  It also indicates the overhead
these small functions incur.

What I meant for "we" is "R core developers".  At this time, R adhers
strict C89 standard, so no inline function is allowed.

I'm not aware of other ways of indicating "inlining" by other compilers,
could you please give an example?

The request is to allowed "inline", not full C99 support.

As indicated in my original request, compiler could choose to treat
"inline" functions the same as normal functions. C++ supports "inline"
functions from the beginning, and C99 puts it in as a new keyword, "gcc"
supports it since 2.95, these point to a mature feature of the C language.

The inline functions are most suitable for small functions so that they
don't stall the pipeline, and they should be used when they are most
beneficial.  From this aspect, the portability and compatability issues
are easily manageable.

Li Long

Brian Ripley

Mon, May 17, 2004 2:27 PM #

On Mon, 17 May 2004, Li Long wrote:

However that part of nmath/mlutils.c is only used standalone and both it
and main/arithmetic.c only provided R_IsNaNorNA.  And in the current
sources where the non-IEEE case has been removed it is only in the first
two. Nothing calls R_IsNaNorNA. It is NOT used. You can remove it from all
those files in the current sources.  What is it you don't understand about
my second sentence?

Your final comment suggests that you do not understand how R's codebase is
organized.  A few parts are separated out (in a slightly different form)  
to make a standalone Rmath library *that does not use R's notion of NAs*.
The differences are larger than the commonality in the current version.  
We could easily streamline it by removing standalone nmath, but not
otherwise.

What is `them'?  Not `dirs', the grammatical immediate antecedent.

I saw no measurable change in replacing all calls to R_finite by isfinite
(which is what we have done recently in 2.0.0 on platforms supporting
isfinite). That's at least as good as inlining, better on platforms on
which isfinite is a macro (as it is on two of those I tested).

BTW, 5% is the expected change in CPU performance in 1.25 months according
to Moore's Law (as stated on an Intel flyer).  It will be eclipsed by the 
speedup by the time R 2.0.0 is released.

On your platform!  I did ask for examples on a commonly used platform, and 
for inlining in R, not a contributed package.

Well, I *am* an R core developer, and I didn't know that.  It is necessary
that R when configured on a strict C89 platform will compile and run,
which that is not what you claimed.  We do make use of several non-C89
features already on particular platforms or compilers.  We even include 
code using __inline__ (have you actually looked at the sources?).

I am very interested in your response to this suggestion.

__CRT_INLINE by MinGW on Windows, for example.  __inline and __inline__
are also used.  For gcc __inline__ would be preferred to allow -pedantic
to be used (and that has helped catch a number of things which stricter 
compilers were rejecting).

`inline' is allowed *if* your package configure tests for it, so there is 
nothing to grant here.

You *will* find __inline__ used in the R sources, in the way it is most
effective, for static functions in a single compilation unit. The R
authors have tended to use macros instead, to the same effect.

See above.  There are issues about inlining and functions imported from
DLLs and shared libraries, so we would need to ensure that the patches you
supply work with libR.so and R.dll.

Let me repeat the request for details of which functions, patches and
evidence of their effectiveness. You *still* have not told us exactly what
you believe to be effective: and I know R_IsNaNorNA (one of your few
examples) is not called in the current R sources.

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595