Hi all,
I noted that some functions in IRanges, e.g. 'runq', can take a long
time to terminate; that interrupting the evaluation e.g. by Ctrl-C does
not seem to be possible; and that in the C code of that package, the
function R_CheckUserInterrupt is never used.
Are these conscious design decisions? Are there (should there be)
general recommendations from the project regarding use of that function
(or possibly equivalent functionality)?
Various other packages (Biostrings, Biobase, ShortRead, vsn, puma,
limma) do use it in various places.
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
[Bioc-devel] R_CheckUserInterrupt in IRanges
11 messages · Patrick Aboyoun, Seth Falcon, Wolfgang Huber +3 more
Wolfgang, We have been silent on when to add R_CheckUserInterrupt calls at the C level for long running operations and only around 20 BioC packages use it. I added calls to R_CheckUserInterrupt in coverage, run*, and view* methods in IRanges, but more can certainly be added to the interval operations. For the developer wiki how does the following general rule sound: C level for loops should include a call to R_CheckUserInterrupt when the total loop time exceeds 10 seconds for a "reasonably sized" job on "reasonable" hardware AND the method is intended for interactive use. Thoughts? Patrick
Wolfgang Huber wrote:
Hi all,
I noted that some functions in IRanges, e.g. 'runq', can take a long
time to terminate; that interrupting the evaluation e.g. by Ctrl-C
does not seem to be possible; and that in the C code of that package,
the function R_CheckUserInterrupt is never used.
Are these conscious design decisions? Are there (should there be)
general recommendations from the project regarding use of that
function (or possibly equivalent functionality)?
Various other packages (Biostrings, Biobase, ShortRead, vsn, puma,
limma) do use it in various places.
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Patrick I suggest:
We have been silent on when to add R_CheckUserInterrupt calls at the C level for long running operations and only around 20 BioC packages use it. I added calls to R_CheckUserInterrupt in coverage, run*, and view* methods in IRanges, but more can certainly be added to the interval operations. For the developer wiki how does the following general rule sound: C level for loops should include a call to R_CheckUserInterrupt when the total loop time exceeds 10 seconds for a "reasonably sized" job on "reasonable" hardware AND the method is intended for interactive use.
C level loops should include a call to R_CheckUserInterrupt when there is any chance that they may not terminate for certain parameter constellations, or when their run time exceeds 10 seconds with typical parameters and the method is intended for interactive use. /* The notion "reasonable hardware" is difficult, since it may just be on unreasonable hardware (like a netbook) where people experience the greatest need for interrupting a carelessly called function. Also, even reasonable hardware can become unreasonable when there is competition for memory and it starts swapping. Unless there is a substantial cost for calling R_CheckUserInterrupt, I don't see why not to be liberal with it. */ Best wishes Wolfgang
Thoughts? Patrick Wolfgang Huber wrote:
Hi all,
I noted that some functions in IRanges, e.g. 'runq', can take a long
time to terminate; that interrupting the evaluation e.g. by Ctrl-C
does not seem to be possible; and that in the C code of that package,
the function R_CheckUserInterrupt is never used.
Are these conscious design decisions? Are there (should there be)
general recommendations from the project regarding use of that
function (or possibly equivalent functionality)?
Various other packages (Biostrings, Biobase, ShortRead, vsn, puma,
limma) do use it in various places.
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code Patrick
Wolfgang Huber wrote:
Hi Patrick I suggest:
We have been silent on when to add R_CheckUserInterrupt calls at the C level for long running operations and only around 20 BioC packages use it. I added calls to R_CheckUserInterrupt in coverage, run*, and view* methods in IRanges, but more can certainly be added to the interval operations. For the developer wiki how does the following general rule sound: C level for loops should include a call to R_CheckUserInterrupt when the total loop time exceeds 10 seconds for a "reasonably sized" job on "reasonable" hardware AND the method is intended for interactive use.
C level loops should include a call to R_CheckUserInterrupt when there
is any chance that they may not terminate for certain parameter
constellations, or when their run time exceeds 10 seconds with typical
parameters and the method is intended for interactive use.
/*
The notion "reasonable hardware" is difficult, since it may just be on
unreasonable hardware (like a netbook) where people experience the
greatest need for interrupting a carelessly called function. Also, even
reasonable hardware can become unreasonable when there is competition
for memory and it starts swapping. Unless there is a substantial cost
for calling R_CheckUserInterrupt, I don't see why not to be liberal
with it.
*/
Best wishes
Wolfgang
Thoughts? Patrick Wolfgang Huber wrote:
Hi all,
I noted that some functions in IRanges, e.g. 'runq', can take a long
time to terminate; that interrupting the evaluation e.g. by Ctrl-C
does not seem to be possible; and that in the C code of that
package, the function R_CheckUserInterrupt is never used.
Are these conscious design decisions? Are there (should there be)
general recommendations from the project regarding use of that
function (or possibly equivalent functionality)?
Various other packages (Biostrings, Biobase, ShortRead, vsn, puma,
limma) do use it in various places.
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable.
It is worth noting, however, that calling R_CheckUserInterrupt is not
entirely free. If you have a tight loop it might make more sense to
check at intervals rather than each time through and the choice of when
to check will depend very much on the computation.
Here's an example using the inline package (code is at the bottom) with
a silly double loop to compare checking vs not:
With R_CheckUserInterrupt inner loop:
> system.time(z <- funx(10000L, TRUE))
user system elapsed
1.310 0.001 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
user system elapsed
0.08 0.00 0.08
Now this may be completely unrealistic, but I suspect that a sensible
take away message is that we should be making sure our C code that loops
does call R_CheckUserInterrupt, but depending on the computation, not at
every loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
INTEGER(ans)[i] = i;
for (j = 0; j < n; j++) {
sum = i + j;
if (check_inter) {
R_CheckUserInterrupt();
}
}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
Seth Falcon Bioconductor Core Team | FHCRC
Hi Seth Yes. I would hope that someone who goes to the trouble of writing C code understands that. Best wishes Wolfgang
Seth Falcon wrote:
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable. It is worth noting, however, that calling R_CheckUserInterrupt is not entirely free. If you have a tight loop it might make more sense to check at intervals rather than each time through and the choice of when to check will depend very much on the computation. Here's an example using the inline package (code is at the bottom) with a silly double loop to compare checking vs not: With R_CheckUserInterrupt inner loop:
> system.time(z <- funx(10000L, TRUE))
user system elapsed
1.310 0.001 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
user system elapsed
0.08 0.00 0.08
Now this may be completely unrealistic, but I suspect that a sensible
take away message is that we should be making sure our C code that loops
does call R_CheckUserInterrupt, but depending on the computation, not at
every loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
INTEGER(ans)[i] = i;
for (j = 0; j < n; j++) {
sum = i + j;
if (check_inter) {
R_CheckUserInterrupt();
}
}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
To 2nd what Seth's says, I think it is worth showing an explicit
example of how to check at regular intervals, e.g. for (i = 0; i < n;
i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
/Henrik
On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable. It is worth noting, however, that calling R_CheckUserInterrupt is not entirely free. ?If you have a tight loop it might make more sense to check at intervals rather than each time through and the choice of when to check will depend very much on the computation. Here's an example using the inline package (code is at the bottom) with a silly double loop to compare checking vs not: With R_CheckUserInterrupt inner loop:
system.time(z <- funx(10000L, TRUE))
? user ?system elapsed
?1.310 ? 0.001 ? 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
? user ?system elapsed
? 0.08 ? ?0.00 ? ?0.08
Now this may be completely unrealistic, but I suspect that a sensible take
away message is that we should be making sure our C code that loops does
call R_CheckUserInterrupt, but depending on the computation, not at every
loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
? ?INTEGER(ans)[i] = i;
? ?for (j = 0; j < n; j++) {
? ? ? ?sum = i + j;
? ? ? ?if (check_inter) {
? ? ? ? ? ?R_CheckUserInterrupt();
? ? ? ?}
? ?}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
--
Seth Falcon
Bioconductor Core Team | FHCRC
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
hi, let me take the chance that you're talking about this to check
whether anybody finds anything odd in the following strategy. in the
package i develop (qpgraph - devel version) for the computationally
intensive C-loops i calculate the percentage of progress and each time
this percentage changes in one integer unit i call
R_CheckUserInterrupt() and i also take the chance to process
window-system events with R_ProcessEvents() (if i understand correctly
this is what this function does). here is the code snippet:
ppct=-1;
k=0;
for (i = 0; i < l_int-1; i++) {
int i2 = INTEGER(pairup_ij_int)[i] - 1;
for (j = i+1; j < l_int; j++) {
int j2 = INTEGER(pairup_ij_int)[j] - 1;
REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);
REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
k++;
pct = (int) ((k * 100) / n_adj);
if (pct != ppct) {
if (INTEGER(verbose)[0]) { ## show progress when
if (pct % 10 == 0) ## verbose=TRUE
Rprintf("%d",pct);
else
Rprintf(".",pct);
R_FlushConsole();
}
R_CheckUserInterrupt();
#ifdef Win32
R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
R_ProcessEvents();
#endif
ppct = pct;
}
}
}
let me clarify that the double nested loop iterates more or less through
every pair of variables and thus this computation grows quadratically in
the number of input variables in the dataset.
robert.
On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
To 2nd what Seth's says, I think it is worth showing an explicit
example of how to check at regular intervals, e.g. for (i = 0; i < n;
i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
/Henrik
On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable. It is worth noting, however, that calling R_CheckUserInterrupt is not entirely free. If you have a tight loop it might make more sense to check at intervals rather than each time through and the choice of when to check will depend very much on the computation. Here's an example using the inline package (code is at the bottom) with a silly double loop to compare checking vs not: With R_CheckUserInterrupt inner loop:
system.time(z <- funx(10000L, TRUE))
user system elapsed
1.310 0.001 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
user system elapsed
0.08 0.00 0.08
Now this may be completely unrealistic, but I suspect that a sensible take
away message is that we should be making sure our C code that loops does
call R_CheckUserInterrupt, but depending on the computation, not at every
loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
INTEGER(ans)[i] = i;
for (j = 0; j < n; j++) {
sum = i + j;
if (check_inter) {
R_CheckUserInterrupt();
}
}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
--
Seth Falcon
Bioconductor Core Team | FHCRC
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This is a bit off-topic but one idiom that I see in this code is the use of INTEGER() or REAL() inside potentially long loops. This is not a good idea. In packages INTEGER(), REAL(), etc. are implemented as functions, not macros. It is best to assign a pointer outside the loop and work with that, avoiding the function call overhead. I realize this could be a minuscule change but if what you do with that pointer within the loop is trivial it is best not to add the function call overhead.
On Tue, Jan 12, 2010 at 2:26 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
hi, let me take the chance that you're talking about this to check
whether anybody finds anything odd in the following strategy. in the
package i develop (qpgraph - devel version) for the computationally
intensive C-loops i calculate the percentage of progress and each time
this percentage changes in one integer unit i call
R_CheckUserInterrupt() and i also take the chance to process
window-system events with R_ProcessEvents() (if i understand correctly
this is what this function does). here is the code snippet:
ppct=-1;
k=0;
for (i = 0; i < l_int-1; i++) {
?int i2 = INTEGER(pairup_ij_int)[i] - 1;
?for (j = i+1; j < l_int; j++) {
? ?int j2 = INTEGER(pairup_ij_int)[j] - 1;
? ?REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);
? ?REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
? ?k++;
? ?pct = (int) ((k * 100) / n_adj);
? ?if (pct != ppct) {
? ? ?if (INTEGER(verbose)[0]) { ## show progress when
? ? ? ?if (pct % 10 == 0) ? ? ? ## verbose=TRUE
? ? ? ? ?Rprintf("%d",pct);
? ? ? ?else
? ? ? ? ?Rprintf(".",pct);
? ? ? ?R_FlushConsole();
? ? ?}
? ? ?R_CheckUserInterrupt();
#ifdef Win32
? ? ?R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
? ? ?R_ProcessEvents();
#endif
? ? ?ppct = pct;
? ?}
?}
}
let me clarify that the double nested loop iterates more or less through
every pair of variables and thus this computation grows quadratically in
the number of input variables in the dataset.
robert.
On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
To 2nd what Seth's says, I think it is worth showing an explicit
example of how to check at regular intervals, e.g. ?for (i = 0; i < n;
i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
/Henrik
On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable. It is worth noting, however, that calling R_CheckUserInterrupt is not entirely free. ?If you have a tight loop it might make more sense to check at intervals rather than each time through and the choice of when to check will depend very much on the computation. Here's an example using the inline package (code is at the bottom) with a silly double loop to compare checking vs not: With R_CheckUserInterrupt inner loop:
system.time(z <- funx(10000L, TRUE))
? user ?system elapsed
?1.310 ? 0.001 ? 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
? user ?system elapsed
? 0.08 ? ?0.00 ? ?0.08
Now this may be completely unrealistic, but I suspect that a sensible take
away message is that we should be making sure our C code that loops does
call R_CheckUserInterrupt, but depending on the computation, not at every
loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
? ?INTEGER(ans)[i] = i;
? ?for (j = 0; j < n; j++) {
? ? ? ?sum = i + j;
? ? ? ?if (check_inter) {
? ? ? ? ? ?R_CheckUserInterrupt();
? ? ? ?}
? ?}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
--
Seth Falcon
Bioconductor Core Team | FHCRC
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later
thanks for the hint, i thought they were macros indeed. given that the two nested loops may take a long time and there are these R_*() calls in between, do you think i should PROTECT or PROTECT_WITH_INDEX before assigning pointers outside these loops? thanks, robert.
On Tue, 2010-01-12 at 09:34 -0600, Douglas Bates wrote:
This is a bit off-topic but one idiom that I see in this code is the use of INTEGER() or REAL() inside potentially long loops. This is not a good idea. In packages INTEGER(), REAL(), etc. are implemented as functions, not macros. It is best to assign a pointer outside the loop and work with that, avoiding the function call overhead. I realize this could be a minuscule change but if what you do with that pointer within the loop is trivial it is best not to add the function call overhead. On Tue, Jan 12, 2010 at 2:26 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
hi, let me take the chance that you're talking about this to check
whether anybody finds anything odd in the following strategy. in the
package i develop (qpgraph - devel version) for the computationally
intensive C-loops i calculate the percentage of progress and each time
this percentage changes in one integer unit i call
R_CheckUserInterrupt() and i also take the chance to process
window-system events with R_ProcessEvents() (if i understand correctly
this is what this function does). here is the code snippet:
ppct=-1;
k=0;
for (i = 0; i < l_int-1; i++) {
int i2 = INTEGER(pairup_ij_int)[i] - 1;
for (j = i+1; j < l_int; j++) {
int j2 = INTEGER(pairup_ij_int)[j] - 1;
REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);
REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
k++;
pct = (int) ((k * 100) / n_adj);
if (pct != ppct) {
if (INTEGER(verbose)[0]) { ## show progress when
if (pct % 10 == 0) ## verbose=TRUE
Rprintf("%d",pct);
else
Rprintf(".",pct);
R_FlushConsole();
}
R_CheckUserInterrupt();
#ifdef Win32
R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
R_ProcessEvents();
#endif
ppct = pct;
}
}
}
let me clarify that the double nested loop iterates more or less through
every pair of variables and thus this computation grows quadratically in
the number of input variables in the dataset.
robert.
On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
To 2nd what Seth's says, I think it is worth showing an explicit
example of how to check at regular intervals, e.g. for (i = 0; i < n;
i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
/Henrik
On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable. It is worth noting, however, that calling R_CheckUserInterrupt is not entirely free. If you have a tight loop it might make more sense to check at intervals rather than each time through and the choice of when to check will depend very much on the computation. Here's an example using the inline package (code is at the bottom) with a silly double loop to compare checking vs not: With R_CheckUserInterrupt inner loop:
system.time(z <- funx(10000L, TRUE))
user system elapsed
1.310 0.001 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
user system elapsed
0.08 0.00 0.08
Now this may be completely unrealistic, but I suspect that a sensible take
away message is that we should be making sure our C code that loops does
call R_CheckUserInterrupt, but depending on the computation, not at every
loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
INTEGER(ans)[i] = i;
for (j = 0; j < n; j++) {
sum = i + j;
if (check_inter) {
R_CheckUserInterrupt();
}
}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
--
Seth Falcon
Bioconductor Core Team | FHCRC
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Wed, Jan 13, 2010 at 12:50 PM, Robert Castelo <robert.castelo at upf.edu> wrote:
thanks for the hint, i thought they were macros indeed. given that the two nested loops may take a long time and there are these R_*() calls in between, do you think i should PROTECT or PROTECT_WITH_INDEX before assigning pointers outside these loops?
If the SEXPs from which you are extracting pointers are arguments in a
.Call from R then they are already protected. Also, when you are
using a scalar value passed from R then you can assign the value to an
int or a doubler in your C code and you don't need to worry about
garbage collection. It may be that a pointer from an SEXP can be
reassigned as a result of garbage collection but Robert or Seth (or
Luke) would be better able to answer that question than I. Most of
the code that I write doesn't call R routines after the initial setup
so I count on the pointers as being fixed.
Another approach, which is used within the Rcpp package, is to copy
the contents of the pointer into dynamically allocated storage on
entry so you are insulated from garbage collection. Here is how I
would rewrite your code fragment (including replacing the R-style
comments with C comments :-)
double *nrr = REAL(nrrMatrix),
*ss = REAL(S),
alph = *REAL(alpha);
int *pair = INTEGER(pairup_ij_int),
NN = *INTEGER(N),
ntest = *INTEGER(nTests),
verb = *INTEGER(verbose),
ppct=-1;
k=0;
for (i = 0; i < l_int-1; i++) {
int i2 = pair[i] - 1;
for (j = i+1; j < l_int; j++) {
int j2 = pair[j] - 1;
nrr[i2+j2*n_var] = qp_edge_nrr(ss, n_var, NN, i2, j2, q, ntest, alph);
nrr[j2+i2*n_var] = nrr[i2+j2*n_var];
k++;
pct = (int) ((k * 100) / n_adj);
if (pct != ppct) {
if (verb) { /* show progress when */
if (pct % 10 == 0) /* verbose=TRUE */
Rprintf("%d",pct);
else
Rprintf(".",pct);
R_FlushConsole();
}
R_CheckUserInterrupt();
#ifdef Win32
R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
R_ProcessEvents();
#endif
ppct = pct;
}
}
}
>
thanks, robert. On Tue, 2010-01-12 at 09:34 -0600, Douglas Bates wrote:
This is a bit off-topic but one idiom that I see in this code is the use of INTEGER() or REAL() inside potentially long loops. ?This is not a good idea. ?In packages INTEGER(), REAL(), etc. are implemented as functions, not macros. ?It is best to assign a pointer outside the loop and work with that, avoiding the function call overhead. ?I realize this could be a minuscule change but if what you do with that pointer within the loop is trivial it is best not to add the function call overhead. On Tue, Jan 12, 2010 at 2:26 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
hi, let me take the chance that you're talking about this to check
whether anybody finds anything odd in the following strategy. in the
package i develop (qpgraph - devel version) for the computationally
intensive C-loops i calculate the percentage of progress and each time
this percentage changes in one integer unit i call
R_CheckUserInterrupt() and i also take the chance to process
window-system events with R_ProcessEvents() (if i understand correctly
this is what this function does). here is the code snippet:
ppct=-1;
k=0;
for (i = 0; i < l_int-1; i++) {
?int i2 = INTEGER(pairup_ij_int)[i] - 1;
?for (j = i+1; j < l_int; j++) {
? ?int j2 = INTEGER(pairup_ij_int)[j] - 1;
? ?REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);
? ?REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
? ?k++;
? ?pct = (int) ((k * 100) / n_adj);
? ?if (pct != ppct) {
? ? ?if (INTEGER(verbose)[0]) { ## show progress when
? ? ? ?if (pct % 10 == 0) ? ? ? ## verbose=TRUE
? ? ? ? ?Rprintf("%d",pct);
? ? ? ?else
? ? ? ? ?Rprintf(".",pct);
? ? ? ?R_FlushConsole();
? ? ?}
? ? ?R_CheckUserInterrupt();
#ifdef Win32
? ? ?R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
? ? ?R_ProcessEvents();
#endif
? ? ?ppct = pct;
? ?}
?}
}
let me clarify that the double nested loop iterates more or less through
every pair of variables and thus this computation grows quadratically in
the number of input variables in the dataset.
robert.
On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
To 2nd what Seth's says, I think it is worth showing an explicit
example of how to check at regular intervals, e.g. ?for (i = 0; i < n;
i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
/Henrik
On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
I just modified the C or Fortran code section of package guidelines to reflect this new entry: http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
This looks reasonable. It is worth noting, however, that calling R_CheckUserInterrupt is not entirely free. ?If you have a tight loop it might make more sense to check at intervals rather than each time through and the choice of when to check will depend very much on the computation. Here's an example using the inline package (code is at the bottom) with a silly double loop to compare checking vs not: With R_CheckUserInterrupt inner loop:
system.time(z <- funx(10000L, TRUE))
? user ?system elapsed
?1.310 ? 0.001 ? 1.313
No checking:
system.time(z <- funx(10000L, FALSE))
? user ?system elapsed
? 0.08 ? ?0.00 ? ?0.08
Now this may be completely unrealistic, but I suspect that a sensible take
away message is that we should be making sure our C code that loops does
call R_CheckUserInterrupt, but depending on the computation, not at every
loop iteration.
+ seth
## example code is below.
library("inline")
code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
? ?INTEGER(ans)[i] = i;
? ?for (j = 0; j < n; j++) {
? ? ? ?sum = i + j;
? ? ? ?if (check_inter) {
? ? ? ? ? ?R_CheckUserInterrupt();
? ? ? ?}
? ?}
}
UNPROTECT(1);
return ans;
"
funx <- cfunction(signature(max="integer", do_check="logical"), code)
system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
--
Seth Falcon
Bioconductor Core Team | FHCRC
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel