Help optimizing EMD::extrema()

Sun, Feb 13, 2011 2:57 PM

I did not try to emulate the ndata nad ndatam1 arguments
to extrema(), as I didn't see what they were for.  The
help file simply says they are the length of the first
argument and that minus 1 and that is what their default
values are.  If they do not have their default values
then extrema() frequently dies.  You could add them them
to the argument list and not use them, or check that
they are what they default to, as in
   function(x, ndata, ndatam1) {
      stopifnot(length(x)==ndata, ndata-1==ndatam1)
      ... rest of code ...
If the check fails then someone needs to say or figure out
what they are for.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
Sent: Sunday, February 13, 2011 2:08 PM
To: Mike Lawrence; r-help at lists.R-project.org
Subject: Re: [R] Help optimizing EMD::extrema()

-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of Mike Lawrence
Sent: Friday, February 11, 2011 9:28 AM
To: r-help at lists.R-project.org
Subject: [R] Help optimizing EMD::extrema()

Hi folks,

I'm attempting to use the EMD package to analyze some neuroimaging
data (timeseries with 64 channels sampled across 1 million

time points

within each of 20 people). I found that processing a single

channel of

data using EMD::emd() took about 8 hours. Exploration using Rprof()
suggested that most of the compute time was spent in EMD::extrema().
Looking at the code for EMD:extrema(), I managed to find one obvious
speedup (switching from employing rbind() to c()) and I suspect that
there may be a way to further speed things up by pre-allocating all
the objects that are currently being created with c(), but

I'm having

trouble understanding the code sufficiently to know

when/where to try

this and what sizes to set as the default pre-allocation

length. Below

I include code that demonstrates the speedup I achieved by

eliminating

calls to rbind(), and also demonstrates that only a few calls to c()
seem to be responsible for most of the compute time. The files
"extrema_c.R" and "extrema_c2.R" are available at:
https://gist.github.com/822691

Try the following new.extrema().  It is based on
looking at runs in the data in a vectorized way.
On my old laptop the running times for length(x)=2^(2:118)
with EMD::extrema and new.extrema are
       old.time new.time
4          0.00     0.00
8          0.00     0.00
16         0.00     0.00
32         0.00     0.00
64         0.00     0.00
128        0.00     0.00
256        0.00     0.00
512        0.02     0.00
1024       0.03     0.00
2048       0.06     0.01
4096       0.14     0.00
8192       0.37     0.02
16384      1.08     0.03
32768      3.64     0.06
65536     13.35     0.12
131072    48.42     0.25
262144   206.17     0.59

isEndOfStrictlyIncreasingRun <- function(x) {
    c(FALSE, diff(diff(x) > 0) < 0, FALSE)
}

isStartOfStrictlyIncreasingRun <- function(x) {
    c(FALSE, diff(diff(x) <= 0) < 0, FALSE)
}

isEndOfStrictlyDecreasingRun <- function(x) {
    isEndOfStrictlyIncreasingRun(-x)
}

isStartOfStrictlyDecreasingRun <- function(x) {
    isStartOfStrictlyIncreasingRun(-x)
}

isStartOfZeroRun <- function(x) {
    x==0 & c(TRUE, x[-length(x)]!=0)
}

nToEndOfCurrentFlatRun <- function(x) {
    # for each element of x, how far to end of current
    # run of equal values.
    rev( sequence(rle(rev(x))$lengths) - 1L)
}

indexOfEndOfCurrentFlatRun <- function(x) {
    # should be a more direct way of doing this, but this is pretty
quick
    seq_len(length(x)) + nToEndOfCurrentFlatRun(x)
}

new.extrema <- function(x) {
    f <- indexOfEndOfCurrentFlatRun(x)
    isMaxStart <- isEndOfStrictlyIncreasingRun(x) &
isStartOfStrictlyDecreasingRun(x)[f]
    maxindex <- cbind(which(isMaxStart), f[isMaxStart], 
deparse.level=0)

    isMinStart <- isEndOfStrictlyDecreasingRun(x) &
isStartOfStrictlyIncreasingRun(x)[f]
    minindex <- cbind(which(isMinStart), f[isMinStart], 
deparse.level=0)


    # zero-crossings are bit weird: Report either the
before-zero/after-zero
    # pair or the start and stop of a a run of zero's (even if the run
is
    # not part of an actual zero-crossing).  Do them separately then
sort.
    # Also, if the entire sequence never actually crosses 0, do
    # not report the zero-touchings.
    # Also, if length(x)==2, the original doesn't report a 
zero-crossing
or
    # a zero run.  We do not copy that.
    if (max(x) > 0 && min(x) < 0) {
        indexOfZeroCrossingStart <- which(c(abs(diff(sign(x)))==2,
FALSE))
        indexOfZeroCrossingEnd <- indexOfZeroCrossingStart + 1L
        indexOfZeroRunStart <- which(isStartOfZeroRun(x))
        indexOfZeroRunEnd <- f[indexOfZeroRunStart]
        crossStart <- c(indexOfZeroCrossingStart, indexOfZeroRunStart)
        cross <- cbind(crossStart, c(indexOfZeroCrossingEnd,
indexOfZeroRunEnd), deparse.level=0)[order(crossStart),, drop=FALSE]
    } else {
        cross <- cbind(integer(), integer())
    }
    # extrema likes to return NULL instead of a zero-row matrix,
    # so we follow it.  Zero-row matrices are easier to deal with.
    list(
         minindex=if (nrow(minindex)) minindex else NULL,
         maxindex=if (nrow(maxindex)) maxindex else NULL,
         nextreme=nrow(minindex) + nrow(maxindex),
         cross=if(nrow(cross)) cross else NULL,
         ncross=nrow(cross) # extrema() returns numeric, not integer,
here
    )
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

Any suggestions/help would be greatly appreciated.


#load the EMD library for the default version of extrema
library(EMD)

#some data to process
values = rnorm(1e4)

#profile the default version of extrema
Rprof(tmp <- tempfile())
temp = extrema(values)
Rprof()
summaryRprof(tmp)
#1.2s total with most time spend doing rbind
unlink(tmp)

#load a rbind-free version of extrema
source('extrema_c.R')
Rprof(tmp <- tempfile())
temp = extrema_c(values)
Rprof()
summaryRprof(tmp) #much faster! .5s total
unlink(tmp)

#still, it encounters slowdowns with lots of data
values = rnorm(1e5)
Rprof(tmp <- tempfile())
temp = extrema_c(values)
Rprof()
summaryRprof(tmp)
#44s total, hard to see what's taking up so much time
unlink(tmp)

#load an rbind-free version of extrema that labels each call to c()
source('extrema_c2.R')
Rprof(tmp <- tempfile())
temp = extrema_c2(values)
Rprof()
summaryRprof(tmp)
#same time as above, but now we see that it spends more time in
certain calls to c() than others
unlink(tmp)

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Help optimizing EMD::extrema()

Thread (5 messages)