Dear all,
I wrote a function using Rcpp; it is a simple function; however, it
significantly improves the performance. I realized that on some systems,
the result of the function is different from others.
For example, on the following systems (on rhub), the results are identical
and are correct:
- Fedora Linux, R-devel, clang, gfortran (fedora-clang-devel)
- macOS 10.13.6 High Sierra, R-release, brew (macos-highsierra-release)
- Windows Server 2008 R2 SP1, R-devel, 32/64 bit (windows-x86_64-devel)
However, on the following systems (on rhub), the results are different:
- Ubuntu Linux 20.04.1 LTS, R-devel, GCC (ubuntu-gcc-devel)
Here is the error:
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
val[1] not equal to 4.
1/1 mismatches
[1] 2 - 4 == -2
── Failure (test-compute_closest_wgps_helper.R:12:3):
compute_closest_wgps_helper works as expected. ──
val[3] not equal to 3.
1/1 mismatches
[1] 1 - 3 == -2
── Failure (test-compute_closest_wgps_helper.R:13:3):
compute_closest_wgps_helper works as expected. ──
val[4] not equal to 1.
1/1 mismatches
[1] 2 - 1 == 1
[ FAIL 3 | WARN 0 | SKIP 0 | PASS 69 ]
As much as I can check the code, it looks correct. But I am not sure what
else I can check. Some package users claimed that different R versions
provide different results; however, I checked it on r-base=3.6, 4.0, 4.1
and received the same correct results (macOS 11.2).
I would be grateful if you could let me know your thoughts about this issue.
Here is the source code:
https://github.com/fasrc/CausalGPS/blob/97342ecc90fd20f2c7c4458a70aefe471b09510c/src/compute_closest_wgps_helper.cpp
Here is the related test:
https://github.com/fasrc/CausalGPS/blob/97342ecc90fd20f2c7c4458a70aefe471b09510c/tests/testthat/test-compute_closest_wgps_helper.R
Best regards,
Naeem Khoshnevis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210818/bc56002e/attachment-0001.html>
[Rcpp-devel] Rcpp function results are different on different systems
9 messages · Naeem Khoshnevis, Dirk Eddelbuettel, Iñaki Ucar +1 more
Naeem, I would simplify, simplify, simplify -- as 'Rcpp FAQ 7.31' reminds us all, testing _equality_ of doubles is challenging anyway. Besides, it may make sense to would ascertain first you get what you want in _purely serial modes_ and then move to OpenMP. Dirk
https://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
HI, I cannot help with your bug. However... Le 19/08/2021 à 04:17, Naeem Khoshnevis a écrit :
Dear all, I wrote a function using Rcpp; it is a simple function; however, it significantly improves the performance.
Compared to what? The same algorithm with plenty for-loop re-written in R? Then sure, it will improve the performance. But what your algorithm does can be expressed as one-liner in R: Â Â Â Â Â out=apply(abs(outer(a,b,"-"))*sc+cd, 2, which.min) It should be also much more performant then R code with for-loop everywhere (though not so performant as C++ code). Oh, and it will not require a whole package to do the job. Best, Serguei.
I realized that on some systems, the result of the function is different from others. For example, on the following systems (on rhub), the results are identical and are correct: * Fedora Linux, R-devel, clang, gfortran (fedora-clang-devel) * macOS 10.13.6 High Sierra, R-release, brew (macos-highsierra-release) * Windows Server 2008 R2 SP1, R-devel, 32/64 bit (windows-x86_64-devel) However, on the following systems (on rhub), the results are different: *  Ubuntu Linux 20.04.1 LTS, R-devel, GCC (ubuntu-gcc-devel) Here is the error: Running the tests in ‘tests/testthat.R’ failed.   Last 13 lines of output:  val[1] not equal to 4.    1/1 mismatches    [1] 2 - 4 == -2    ── Failure (test-compute_closest_wgps_helper.R:12:3): compute_closest_wgps_helper works as expected. ──  val[3] not equal to 3.    1/1 mismatches    [1] 1 - 3 == -2    ── Failure (test-compute_closest_wgps_helper.R:13:3): compute_closest_wgps_helper works as expected. ──  val[4] not equal to 1.    1/1 mismatches    [1] 2 - 1 == 1    [ FAIL 3 | WARN 0 | SKIP 0 | PASS 69 ] As much as I can check the code, it looks correct. But I am not sure what else I can check. Some package users claimed that different R versions provide different results; however, I checked it on r-base=3.6, 4.0, 4.1 and received the same correct results (macOS 11.2). I would be grateful if you could let me know your thoughts about this issue. Here is the source code: https://github.com/fasrc/CausalGPS/blob/97342ecc90fd20f2c7c4458a70aefe471b09510c/src/compute_closest_wgps_helper.cpp <https://github.com/fasrc/CausalGPS/blob/97342ecc90fd20f2c7c4458a70aefe471b09510c/src/compute_closest_wgps_helper.cpp> Here is the related test: https://github.com/fasrc/CausalGPS/blob/97342ecc90fd20f2c7c4458a70aefe471b09510c/tests/testthat/test-compute_closest_wgps_helper.R <https://github.com/fasrc/CausalGPS/blob/97342ecc90fd20f2c7c4458a70aefe471b09510c/tests/testthat/test-compute_closest_wgps_helper.R> Best regards, Naeem Khoshnevis
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210819/645c0343/attachment.html>
On Thu, 19 Aug 2021 at 04:53, Dirk Eddelbuettel <edd at debian.org> wrote:
Naeem, I would simplify, simplify, simplify -- as 'Rcpp FAQ 7.31' reminds us all, testing _equality_ of doubles is challenging anyway. Besides, it may make sense to would ascertain first you get what you want in _purely serial modes_ and then move to OpenMP.
Exactly. Serial execution should be fine. I.e., if you set the number of threads to 1, then all platforms will return the same result. However, you have defined a number of variables outside the parallel region, and then you modify them inside the parallel region. OpenMP takes those variables as shared by default, which leads to the unexpected results you are getting. You need to tell OpenMP that those variables are threadprivate. Or you could just define them inside the parallel region, so that OpenMP knows that they are private without additional hints.
Iñaki Úcar
Thank you so much, everyone, for responding to this email. Dirk, - I didn't think about testing _equality_ of doubles because the numbers are significantly different (e.g., instead of 0.5, chooses 1.5). However, that is a valid point, and I should be aware of that. - You are right about the serial runs. Whenever I deactivate OpenMP, the results are correct. Serguei, - Thanks for the comments. Yes, I agree. It seems outer is a better option. We have started with outer. However, outer builds the entire matrix of differences first, then finds the minimum index. In our application, it requires 200 GB of memory to build that matrix. Rcpp does the job with around 10 MB. That is why I switched to Rcpp. Please let me know your thoughts. Iñaki, - Thanks for your suggestion. Yes, the problem is shared values, and it resolved the issue. I really appreciate it. Best regards, Naeem
On Thu, Aug 19, 2021 at 4:56 AM Iñaki Ucar <iucar at fedoraproject.org> wrote:
On Thu, 19 Aug 2021 at 04:53, Dirk Eddelbuettel <edd at debian.org> wrote:
Naeem, I would simplify, simplify, simplify -- as 'Rcpp FAQ 7.31' reminds us
all,
testing _equality_ of doubles is challenging anyway. Besides, it may make sense to would ascertain first you get what you
want in
_purely serial modes_ and then move to OpenMP.
Exactly. Serial execution should be fine. I.e., if you set the number of threads to 1, then all platforms will return the same result. However, you have defined a number of variables outside the parallel region, and then you modify them inside the parallel region. OpenMP takes those variables as shared by default, which leads to the unexpected results you are getting. You need to tell OpenMP that those variables are threadprivate. Or you could just define them inside the parallel region, so that OpenMP knows that they are private without additional hints. -- Iñaki Úcar
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210819/dcf1b05c/attachment.html>
On 19 August 2021 at 10:04, Naeem Khoshnevis wrote:
| Thank you so much, everyone, for responding to this email. Thanks for circling back! This was a pretty impressive thread as you got three distinct answers that all added some value :) Dirk
https://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Le 19/08/2021 à 17:04, Naeem Khoshnevis a écrit :
Thank you so much, everyone, for responding to this email.
Dirk,
* I didn't think about testing _equality_ of doubles because the
numbers are significantly different (e.g., instead of 0.5, chooses
1.5). However, that is a valid point, and I should be aware of that.
* You are right about the serial runs. Whenever I deactivate OpenMP,
the results are correct.
Serguei,
* Thanks for the comments. Yes, I agree. It seems outer is a better
option. We have started with outer. However, outer builds the
entire matrix of differences first, then finds the minimum index.
In our application, it requires 200 GB of memory to build that matrix.
This problem was hardly foreseeable from examples in hand. But I still think that a pure R solution can be a runner:   ac=a*sc   bc=b*sc   out=vapply(seq_along(bc), function(i) which.min(abs(ac-bc[i])+cd[i]), integer(1)) Best, Serguei.
* Rcpp does the job with around 10 MB. That is why I switched to
Rcpp. Please let me know your thoughts.
Iñaki,
* Thanks for your suggestion. Yes, the problem is shared values, and
it resolved the issue. I really appreciate it.
Best regards,
Naeem
On Thu, Aug 19, 2021 at 4:56 AM Iñaki Ucar <iucar at fedoraproject.org
<mailto:iucar at fedoraproject.org>> wrote:
On Thu, 19 Aug 2021 at 04:53, Dirk Eddelbuettel <edd at debian.org
<mailto:edd at debian.org>> wrote:
>
>
> Naeem,
>
> I would simplify, simplify, simplify -- as 'Rcpp FAQ 7.31'
reminds us all,
> testing _equality_ of doubles is challenging anyway.
>
> Besides, it may make sense to would ascertain first you get what
you want in
> _purely serial modes_ and then move to OpenMP.
Exactly. Serial execution should be fine. I.e., if you set the number
of threads to 1, then all platforms will return the same result.
However, you have defined a number of variables outside the parallel
region, and then you modify them inside the parallel region. OpenMP
takes those variables as shared by default, which leads to the
unexpected results you are getting. You need to tell OpenMP that those
variables are threadprivate. Or you could just define them inside the
parallel region, so that OpenMP knows that they are private without
additional hints.
--
Iñaki Úcar
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210819/b0ab0fdb/attachment.html>
Le 19/08/2021 à 17:41, Sokol Serguei a écrit :
Le 19/08/2021 à 17:04, Naeem Khoshnevis a écrit :
Thank you so much, everyone, for responding to this email.
Dirk,
* I didn't think about testing _equality_ of doubles because the
numbers are significantly different (e.g., instead of 0.5,
chooses 1.5). However, that is a valid point, and I should be
aware of that.
* You are right about the serial runs. Whenever I deactivate
OpenMP, the results are correct.
Serguei,
* Thanks for the comments. Yes, I agree. It seems outer is a better
option. We have started with outer. However, outer builds the
entire matrix of differences first, then finds the minimum index.
In our application, it requires 200 GB of memory to build that
matrix.
This problem was hardly foreseeable from examples in hand. But I still think that a pure R solution can be a runner:   ac=a*sc   bc=b*sc   out=vapply(seq_along(bc), function(i) which.min(abs(ac-bc[i])+cd[i]), integer(1))
Further optimizing: "+cd[i]" does not change the result of which.min() so it can be dropped. The same reasoning can be applied to "*sc". As the problem is reduced to searching the index of the closest value in 'a' to 'b[i]', it can be solved in O(log(n)) time (instead of O(n)) by binary search if 'a' can be sorted before operations. If 'b' can also be sorted, then the whole timing can be further reduced. E.g. we can use findInterval() implementing such optimal searching. However, its result should be post-processed to find value indexes instead of interval ones and get back to original indexes instead of sorted ones. Something like:   oa=order(a)   ob=order(b)   ao=a[oa]   bo=b[ob]   i=findInterval(bo, ao) # main work is done here   ii=i+ifelse(i < n & ao[i+1]-bo < bo-ao[i], 1, 0)   out=oa[ii]   out[ob]=out Serguei.
Best, Serguei.
* Rcpp does the job with around 10 MB. That is why I switched to
Rcpp. Please let me know your thoughts.
Iñaki,
* Thanks for your suggestion. Yes, the problem is shared values,
and it resolved the issue. I really appreciate it.
Best regards,
Naeem
On Thu, Aug 19, 2021 at 4:56 AM Iñaki Ucar <iucar at fedoraproject.org
<mailto:iucar at fedoraproject.org>> wrote:
On Thu, 19 Aug 2021 at 04:53, Dirk Eddelbuettel <edd at debian.org
<mailto:edd at debian.org>> wrote:
>
>
> Naeem,
>
> I would simplify, simplify, simplify -- as 'Rcpp FAQ 7.31'
reminds us all,
> testing _equality_ of doubles is challenging anyway.
>
> Besides, it may make sense to would ascertain first you get
what you want in
> _purely serial modes_ and then move to OpenMP.
Exactly. Serial execution should be fine. I.e., if you set the number
of threads to 1, then all platforms will return the same result.
However, you have defined a number of variables outside the parallel
region, and then you modify them inside the parallel region. OpenMP
takes those variables as shared by default, which leads to the
unexpected results you are getting. You need to tell OpenMP that
those
variables are threadprivate. Or you could just define them inside the
parallel region, so that OpenMP knows that they are private without
additional hints.
--
Iñaki Úcar
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210819/b2f9b433/attachment-0001.html>
Le 19/08/2021 à 19:01, Sokol Serguei a écrit :
Le 19/08/2021 à 17:41, Sokol Serguei a écrit :
Le 19/08/2021 à 17:04, Naeem Khoshnevis a écrit :
Thank you so much, everyone, for responding to this email.
Dirk,
* I didn't think about testing _equality_ of doubles because the
numbers are significantly different (e.g., instead of 0.5,
chooses 1.5). However, that is a valid point, and I should be
aware of that.
* You are right about the serial runs. Whenever I deactivate
OpenMP, the results are correct.
Serguei,
* Thanks for the comments. Yes, I agree. It seems outer is a better
option. We have started with outer. However, outer builds the
entire matrix of differences first, then finds the minimum index.
In our application, it requires 200 GB of memory to build that
matrix.
This problem was hardly foreseeable from examples in hand. But I still think that a pure R solution can be a runner:   ac=a*sc   bc=b*sc   out=vapply(seq_along(bc), function(i) which.min(abs(ac-bc[i])+cd[i]), integer(1))
Further optimizing: "+cd[i]" does not change the result of which.min() so it can be dropped. The same reasoning can be applied to "*sc". As the problem is reduced to searching the index of the closest value in 'a' to 'b[i]', it can be solved in O(log(n)) time (instead of O(n)) by binary search if 'a' can be sorted before operations. If 'b' can also be sorted, then the whole timing can be further reduced. E.g. we can use findInterval() implementing such optimal searching. However, its result should be post-processed to find value indexes instead of interval ones and get back to original indexes instead of sorted ones. Something like:   oa=order(a)   ob=order(b)   ao=a[oa]   bo=b[ob]   i=findInterval(bo, ao) # main work is done here   ii=i+ifelse(i < n & ao[i+1]-bo < bo-ao[i], 1, 0)
oops... Here 'n' is length(b). It was defined in my environment, so the snipet worked as expected when I copy/pasted it. But it may be not your case. Sorry. Best, Serguei.
  out=oa[ii]   out[ob]=out Serguei.
Best, Serguei.
* Rcpp does the job with around 10 MB. That is why I switched to
Rcpp. Please let me know your thoughts.
Iñaki,
* Thanks for your suggestion. Yes, the problem is shared values,
and it resolved the issue. I really appreciate it.
Best regards,
Naeem
On Thu, Aug 19, 2021 at 4:56 AM Iñaki Ucar <iucar at fedoraproject.org
<mailto:iucar at fedoraproject.org>> wrote:
On Thu, 19 Aug 2021 at 04:53, Dirk Eddelbuettel <edd at debian.org
<mailto:edd at debian.org>> wrote:
>
>
> Naeem,
>
> I would simplify, simplify, simplify -- as 'Rcpp FAQ 7.31'
reminds us all,
> testing _equality_ of doubles is challenging anyway.
>
> Besides, it may make sense to would ascertain first you get
what you want in
> _purely serial modes_ and then move to OpenMP.
Exactly. Serial execution should be fine. I.e., if you set the number
of threads to 1, then all platforms will return the same result.
However, you have defined a number of variables outside the parallel
region, and then you modify them inside the parallel region. OpenMP
takes those variables as shared by default, which leads to the
unexpected results you are getting. You need to tell OpenMP that
those
variables are threadprivate. Or you could just define them inside the
parallel region, so that OpenMP knows that they are private without
additional hints.
--
Iñaki Úcar
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel