Skip to content

Optimizing C code

7 messages · Romain Francois, Henrik Bengtsson, Duncan Murdoch +1 more

#
Hi the list,

I need to write some efficient distances function, so I read the code 
for the Euclidean distance.
I do not understand the purpose of the line 11 : if x[i] and y[i] are 
not NA (line 9), can dev be NA ?

Christophe


#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))

1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3.    double dev, dist;
4.    int count, i;
5.
6.    count= 0;
7.    dist = 0;
8.    for(i = 0 ; i < taille ; i++) {
9.    if(both_non_NA(x[i], y[i])) {
10.        dev = (x[i] - y[i]);
11.        if(!ISNAN(dev)) {
12.        dist += dev * dev;
13.        count++;
14.        }
15.    }
16.    }
17.    if(count == 0)return NA_REAL;
18.    if(count != taille) dist /= ((double)count/taille);
19.    return sqrt(dist);
20.}
#
Bonjour Christophe,

NA and NaN are different things... Actually this is tricky because NA is 
implemented as a special kind of NaN :

See this extract of R_ext/Arith.h :

int R_IsNA(double);		/* True for R's NA only */
int R_IsNaN(double);		/* True for special NaN, *not* for NA */
int R_finite(double);		/* True if none of NA, NaN, +/-Inf */
#define ISNA(x)	       R_IsNA(x)

/* ISNAN(): True for *both* NA and NaN.
    NOTE: some systems do not return 1 for TRUE.
    Also note that C++ math headers specifically undefine
    isnan if it is a macro (it is on OS X and in C99),
    hence the workaround.  This code also appears in Rmath.h
*/
#ifdef __cplusplus
   int R_isnancpp(double); /* in arithmetic.c */
#  define ISNAN(x)     R_isnancpp(x)
#else
#  define ISNAN(x)     (isnan(x)!=0)
#endif

Romain

PS: the question would be more appropriate in R-devel.
On 01/22/2010 11:14 AM, Christophe Genolini wrote:

  
    
#
Christophe Genolini wrote:
As Romain said, the test is for NaN as well as NA.  One way it could 
happen is if both x[i] and y[i] were infinite:  then the difference is NaN:

 > Inf - Inf
[1] NaN

Duncan Murdoch
#
Thanks both of you.
So isn't the line 9 useless ? If either x[i] or y[i] are NA, then dev 
will be NA and !ISNAN(dev) will detect it...
Sothe loop cool be

8.    for(i = 0 ; i < taille ; i++) {
10.        dev = (x[i] - y[i]);
11.        if(!ISNAN(dev)) {
12.          dist += dev * dev;
13.          count++;
15.      }
16. }

No ?

Christophe
#
A side note: The NA vs NaN does not seem to play a role here, because:

#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))

So, it is the same type of test used in line 9 and in line 11.

/Henrik

On Fri, Jan 22, 2010 at 9:52 AM, Christophe Genolini
<cgenolin at u-paris10.fr> wrote:
#
On 22/01/2010 12:52 PM, Christophe Genolini wrote:
That would presumably give the same answer, but there are lots of 
reasons it might not be useless:

 - the author might find it clearer, or easier to generalize to integer 
data (where your version wouldn't work).
 - it might be faster, because it can abort sooner.
 - it might be essentially equivalent in all important respects.

Duncan Murdoch
#
Duncan Murdoch a ?crit :
That's 3 good reasons to keep the other code. Thanks.