Skip to content

[Bioc-devel] R_CheckUserInterrupt in IRanges

11 messages · Patrick Aboyoun, Seth Falcon, Wolfgang Huber +3 more

#
Hi all,

I noted that some functions in IRanges, e.g. 'runq', can take a long 
time to terminate; that interrupting the evaluation e.g. by Ctrl-C does 
not seem to be possible; and that in the C code of that package, the 
function R_CheckUserInterrupt is never used.

Are these conscious design decisions? Are there (should there be) 
general recommendations from the project regarding use of that function 
(or possibly equivalent functionality)?

Various other packages (Biostrings, Biobase, ShortRead, vsn, puma, 
limma) do use it in various places.

Best wishes
      Wolfgang


--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
#
Wolfgang,
We have been silent on when to add R_CheckUserInterrupt calls at the C 
level for long running operations and only around 20 BioC packages use 
it. I added calls to R_CheckUserInterrupt in coverage, run*, and view* 
methods in IRanges, but more can certainly be added to the interval 
operations.

For the developer wiki how does the following general rule sound:

C level for loops should include a call to R_CheckUserInterrupt when the 
total loop time exceeds 10 seconds for a "reasonably sized" job on 
"reasonable" hardware AND the method is intended for interactive use.


Thoughts?


Patrick
Wolfgang Huber wrote:
#
Hi Patrick

I suggest:
C level loops should include a call to R_CheckUserInterrupt when there 
is any chance that they may not terminate for certain parameter 
constellations, or when their run time exceeds 10 seconds with typical 
parameters and the method is intended for interactive use.

/*
The notion "reasonable hardware" is difficult, since it may just be on 
unreasonable hardware (like a netbook) where people experience the 
greatest need for interrupting a carelessly called function. Also, even
reasonable hardware can become unreasonable when there is competition 
for memory and it starts swapping. Unless there is a substantial cost 
for calling R_CheckUserInterrupt, I don't see why not to be liberal with it.
*/


	Best wishes
	Wolfgang

  
    
#
I just modified the C or Fortran code section of package guidelines to 
reflect this new entry:

http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code


Patrick
Wolfgang Huber wrote:
#
On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
This looks reasonable.

It is worth noting, however, that calling R_CheckUserInterrupt is not 
entirely free.  If you have a tight loop it might make more sense to 
check at intervals rather than each time through and the choice of when 
to check will depend very much on the computation.

Here's an example using the inline package (code is at the bottom) with 
a silly double loop to compare checking vs not:

With R_CheckUserInterrupt inner loop:

 > system.time(z <- funx(10000L, TRUE))
    user  system elapsed
   1.310   0.001   1.313

No checking:

system.time(z <- funx(10000L, FALSE))
    user  system elapsed
    0.08    0.00    0.08

Now this may be completely unrealistic, but I suspect that a sensible 
take away message is that we should be making sure our C code that loops 
does call R_CheckUserInterrupt, but depending on the computation, not at 
every loop iteration.

+ seth


## example code is below.

library("inline")

code <- "
int i, j, sum = 0, int n = INTEGER(max)[0];
int check_inter = LOGICAL(do_check)[0];
SEXP ans;
PROTECT(ans = allocVector(INTSXP, n));
for (i = 0; i < n; i++) {
     INTEGER(ans)[i] = i;
     for (j = 0; j < n; j++) {
         sum = i + j;
         if (check_inter) {
             R_CheckUserInterrupt();
         }
     }
}
UNPROTECT(1);
return ans;
"

funx <- cfunction(signature(max="integer", do_check="logical"), code)

system.time(z <- funx(10000L, TRUE))
system.time(z <- funx(10000L, FALSE))
#
Hi Seth

Yes. I would hope that someone who goes to the trouble of writing C code 
understands that.

	Best wishes
	Wolfgang
Seth Falcon wrote:

  
    
#
To 2nd what Seth's says, I think it is worth showing an explicit
example of how to check at regular intervals, e.g.  for (i = 0; i < n;
i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }

/Henrik
On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
#
hi, let me take the chance that you're talking about this to check
whether anybody finds anything odd in the following strategy. in the
package i develop (qpgraph - devel version) for the computationally
intensive C-loops i calculate the percentage of progress and each time
this percentage changes in one integer unit i call
R_CheckUserInterrupt() and i also take the chance to process
window-system events with R_ProcessEvents() (if i understand correctly
this is what this function does). here is the code snippet:

ppct=-1;
k=0;

for (i = 0; i < l_int-1; i++) {
  int i2 = INTEGER(pairup_ij_int)[i] - 1;

  for (j = i+1; j < l_int; j++) {
    int j2 = INTEGER(pairup_ij_int)[j] - 1;

    REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);

    REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
    k++;
    pct = (int) ((k * 100) / n_adj);
    if (pct != ppct) {
      if (INTEGER(verbose)[0]) { ## show progress when
        if (pct % 10 == 0)       ## verbose=TRUE
          Rprintf("%d",pct);
        else
          Rprintf(".",pct);
        R_FlushConsole();
      }
      R_CheckUserInterrupt();
#ifdef Win32
      R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
      R_ProcessEvents();
#endif
      ppct = pct;
    }
  }
}

let me clarify that the double nested loop iterates more or less through
every pair of variables and thus this computation grows quadratically in
the number of input variables in the dataset.

robert.
On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
#
This is a bit off-topic but one idiom that I see in this code is the
use of INTEGER() or REAL() inside potentially long loops.  This is not
a good idea.  In packages INTEGER(), REAL(), etc. are implemented as
functions, not macros.  It is best to assign a pointer outside the
loop and work with that, avoiding the function call overhead.  I
realize this could be a minuscule change but if what you do with that
pointer within the loop is trivial it is best not to add the function
call overhead.
On Tue, Jan 12, 2010 at 2:26 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
1 day later
#
thanks for the hint, i thought they were macros indeed. given that the
two nested loops may take a long time and there are these R_*() calls in
between, do you think i should PROTECT or PROTECT_WITH_INDEX before
assigning pointers outside these loops?

thanks,
robert.
On Tue, 2010-01-12 at 09:34 -0600, Douglas Bates wrote:
#
On Wed, Jan 13, 2010 at 12:50 PM, Robert Castelo <robert.castelo at upf.edu> wrote:
If the SEXPs from which you are extracting pointers are arguments in a
.Call from R then they are already protected.  Also, when you are
using a scalar value passed from R then you can assign the value to an
int or a doubler in your C code and you don't need to worry about
garbage collection.  It may be that a pointer from an SEXP can be
reassigned as a result of garbage collection but Robert or Seth (or
Luke) would be better able to answer that question than I.  Most of
the code that I write doesn't call R routines after the initial setup
so I count on the pointers as being fixed.

Another approach, which is used within the Rcpp package, is to copy
the contents of the pointer into dynamically allocated storage on
entry so you are insulated from garbage collection.  Here is how I
would rewrite your code fragment (including replacing the R-style
comments with C comments :-)

double *nrr = REAL(nrrMatrix),
    *ss = REAL(S),
    alph = *REAL(alpha);

int *pair = INTEGER(pairup_ij_int),
    NN = *INTEGER(N),
    ntest = *INTEGER(nTests),
    verb = *INTEGER(verbose),
			
ppct=-1;
k=0;
			
for (i = 0; i < l_int-1; i++) {
 int i2 = pair[i] - 1;

 for (j = i+1; j < l_int; j++) {
   int j2 = pair[j] - 1;

   nrr[i2+j2*n_var] = qp_edge_nrr(ss, n_var, NN, i2, j2, q, ntest, alph);

   nrr[j2+i2*n_var] = nrr[i2+j2*n_var];
   k++;
   pct = (int) ((k * 100) / n_adj);
   if (pct != ppct) {
       if (verb) {		/* show progress when */
       if (pct % 10 == 0)       /* verbose=TRUE */
         Rprintf("%d",pct);
       else
         Rprintf(".",pct);
       R_FlushConsole();
     }
     R_CheckUserInterrupt();
#ifdef Win32
     R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
     R_ProcessEvents();
#endif
     ppct = pct;
   }
 }
}

 >