Hi the list,
I need to write some efficient distances function, so I read the code
for the Euclidean distance.
I do not understand the purpose of the line 11 : if x[i] and y[i] are
not NA (line 9), can dev be NA ?
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. double dev, dist;
4. int count, i;
5.
6. count= 0;
7. dist = 0;
8. for(i = 0 ; i < taille ; i++) {
9. if(both_non_NA(x[i], y[i])) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
14. }
15. }
16. }
17. if(count == 0)return NA_REAL;
18. if(count != taille) dist /= ((double)count/taille);
19. return sqrt(dist);
20.}
Optimizing C code
7 messages · Romain Francois, Henrik Bengtsson, Duncan Murdoch +1 more
Bonjour Christophe,
NA and NaN are different things... Actually this is tricky because NA is
implemented as a special kind of NaN :
See this extract of R_ext/Arith.h :
int R_IsNA(double); /* True for R's NA only */
int R_IsNaN(double); /* True for special NaN, *not* for NA */
int R_finite(double); /* True if none of NA, NaN, +/-Inf */
#define ISNA(x) R_IsNA(x)
/* ISNAN(): True for *both* NA and NaN.
NOTE: some systems do not return 1 for TRUE.
Also note that C++ math headers specifically undefine
isnan if it is a macro (it is on OS X and in C99),
hence the workaround. This code also appears in Rmath.h
*/
#ifdef __cplusplus
int R_isnancpp(double); /* in arithmetic.c */
# define ISNAN(x) R_isnancpp(x)
#else
# define ISNAN(x) (isnan(x)!=0)
#endif
Romain
PS: the question would be more appropriate in R-devel.
On 01/22/2010 11:14 AM, Christophe Genolini wrote:
Hi the list,
I need to write some efficient distances function, so I read the code
for the Euclidean distance.
I do not understand the purpose of the line 11 : if x[i] and y[i] are
not NA (line 9), can dev be NA ?
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. double dev, dist;
4. int count, i;
5.
6. count= 0;
7. dist = 0;
8. for(i = 0 ; i < taille ; i++) {
9. if(both_non_NA(x[i], y[i])) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
14. }
15. }
16. }
17. if(count == 0)return NA_REAL;
18. if(count != taille) dist /= ((double)count/taille);
19. return sqrt(dist);
20.}
Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/KfKn : Rcpp 0.7.2 |- http://tr.im/JOlc : External pointers with Rcpp `- http://tr.im/JFqa : R Journal, Volume 1/2, December 2009
Christophe Genolini wrote:
Hi the list, I need to write some efficient distances function, so I read the code for the Euclidean distance. I do not understand the purpose of the line 11 : if x[i] and y[i] are not NA (line 9), can dev be NA ?
As Romain said, the test is for NaN as well as NA. One way it could happen is if both x[i] and y[i] were infinite: then the difference is NaN: > Inf - Inf [1] NaN Duncan Murdoch
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. double dev, dist;
4. int count, i;
5.
6. count= 0;
7. dist = 0;
8. for(i = 0 ; i < taille ; i++) {
9. if(both_non_NA(x[i], y[i])) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
14. }
15. }
16. }
17. if(count == 0)return NA_REAL;
18. if(count != taille) dist /= ((double)count/taille);
19. return sqrt(dist);
20.}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks both of you.
Inf - Inf
[1] NaN
So isn't the line 9 useless ? If either x[i] or y[i] are NA, then dev
will be NA and !ISNAN(dev) will detect it...
Sothe loop cool be
8. for(i = 0 ; i < taille ; i++) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
15. }
16. }
No ?
Christophe
Duncan Murdoch
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. double dev, dist;
4. int count, i;
5.
6. count= 0;
7. dist = 0;
8. for(i = 0 ; i < taille ; i++) {
9. if(both_non_NA(x[i], y[i])) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
14. }
15. }
16. }
17. if(count == 0)return NA_REAL;
18. if(count != taille) dist /= ((double)count/taille);
19. return sqrt(dist);
20.}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
A side note: The NA vs NaN does not seem to play a role here, because: #define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b)) So, it is the same type of test used in line 9 and in line 11. /Henrik On Fri, Jan 22, 2010 at 9:52 AM, Christophe Genolini
<cgenolin at u-paris10.fr> wrote:
Thanks both of you.
Inf - Inf
[1] NaN
So isn't the line 9 useless ? If either x[i] or y[i] are NA, then dev will
be NA and !ISNAN(dev) will detect it...
Sothe loop cool be
8. ? ?for(i = 0 ; i < taille ; i++) {
10. ? ? ? ?dev = (x[i] - y[i]);
11. ? ? ? ?if(!ISNAN(dev)) {
12. ? ? ? ? ?dist += dev * dev;
13. ? ? ? ? ?count++;
15. ? ? ?}
16. }
No ?
Christophe
Duncan Murdoch
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. ? ?double dev, dist;
4. ? ?int count, i;
5.
6. ? ?count= 0;
7. ? ?dist = 0;
8. ? ?for(i = 0 ; i < taille ; i++) {
9. ? ?if(both_non_NA(x[i], y[i])) {
10. ? ? ? ?dev = (x[i] - y[i]);
11. ? ? ? ?if(!ISNAN(dev)) {
12. ? ? ? ?dist += dev * dev;
13. ? ? ? ?count++;
14. ? ? ? ?}
15. ? ?}
16. ? ?}
17. ? ?if(count == 0)return NA_REAL;
18. ? ?if(count != taille) dist /= ((double)count/taille);
19. ? ?return sqrt(dist);
20.}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 22/01/2010 12:52 PM, Christophe Genolini wrote:
Thanks both of you.
Inf - Inf
[1] NaN
So isn't the line 9 useless ? If either x[i] or y[i] are NA, then dev
will be NA and !ISNAN(dev) will detect it...
Sothe loop cool be
8. for(i = 0 ; i < taille ; i++) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
15. }
16. }
No ?
That would presumably give the same answer, but there are lots of reasons it might not be useless: - the author might find it clearer, or easier to generalize to integer data (where your version wouldn't work). - it might be faster, because it can abort sooner. - it might be essentially equivalent in all important respects. Duncan Murdoch
Christophe
Duncan Murdoch
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. double dev, dist;
4. int count, i;
5.
6. count= 0;
7. dist = 0;
8. for(i = 0 ; i < taille ; i++) {
9. if(both_non_NA(x[i], y[i])) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
14. }
15. }
16. }
17. if(count == 0)return NA_REAL;
18. if(count != taille) dist /= ((double)count/taille);
19. return sqrt(dist);
20.}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Duncan Murdoch a ?crit :
On 22/01/2010 12:52 PM, Christophe Genolini wrote:
Thanks both of you.
Inf - Inf
[1] NaN
So isn't the line 9 useless ? If either x[i] or y[i] are NA, then dev
will be NA and !ISNAN(dev) will detect it...
Sothe loop cool be
8. for(i = 0 ; i < taille ; i++) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
15. }
16. }
No ?
That would presumably give the same answer, but there are lots of reasons it might not be useless: - the author might find it clearer, or easier to generalize to integer data (where your version wouldn't work). - it might be faster, because it can abort sooner. - it might be essentially equivalent in all important respects. Duncan Murdoch
That's 3 good reasons to keep the other code. Thanks.
Christophe
Duncan Murdoch
Christophe
#define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b))
#define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b))
1. static double R_euclidean2(double *x, double *y, int taille)
2. {
3. double dev, dist;
4. int count, i;
5.
6. count= 0;
7. dist = 0;
8. for(i = 0 ; i < taille ; i++) {
9. if(both_non_NA(x[i], y[i])) {
10. dev = (x[i] - y[i]);
11. if(!ISNAN(dev)) {
12. dist += dev * dev;
13. count++;
14. }
15. }
16. }
17. if(count == 0)return NA_REAL;
18. if(count != taille) dist /= ((double)count/taille);
19. return sqrt(dist);
20.}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide >>
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >