How to get rid of loop?

Uwe Ligges <ligges at statistik.tu-dortmund.de> writes:
Ken-JP wrote:
The code below shows what I'm trying to get rid of.
If there is no way to get rid of the loop, I will try to use
package( inline
).
I'm just curious as to whether there is a "vector way" of doing this
algorithm.

I don't see a vector way. If speed is an issue, I'd suggest to port
Parsing the code, it seems like a run of 1's starts with a value
greater than .75, and continues until a value less than .5. So this...

    x1 <- x > .75
    c(x1[[1]], x1[-1] & !x1[-n])

flags the start of each run, and this

    y1 <- lapply(split(x > .5, cumsum(c(x1[[1]], x1[-1] & !x1[-n]))),
                 cumprod)

splits candidate runs x > .5 into subsets beginning at each start
point, and then processes the subset to extend the run as long as the
values are greater than .5. My 'vectorized' version, with an imperfect
solution to the first run and glossing over the point x==.5 in the
original code, is

encode.2 <- function(x) {
    n <- length(x)
    x0 <- x < .25
    y0 <- lapply(split(x <= .5, cumsum(c(x0[[1]], x0[-1] & !x0[-n]))),
                 cumprod)
    x1 <- x > .75
    y1 <- lapply(split(x > .5, cumsum(c(x1[[1]], x1[-1] & !x1[-n]))),
                 cumprod)
    as.vector((cumsum(abs(x-.5) > .25) != 0) *
              (-unlist(y0, use.names=FALSE) + unlist(y1, use.names=FALSE)))
}

this seems to be 7-20x faster than encode.1

encode.1 <- function(x) {
    n <- length( x )
    y <- rep(NA, n)
    yprev <- 0;
    for ( i in (1:n)) {
        if ( x[i]>0.75 ) {
            y[i] <- 1;
        } else if ( x[i]<0.25 ) {
            y[i] <- -1;
        } else
        if ( yprev==1 & x[i]<0.5) {
            y[i] <- 0;
        } else if ( yprev==-1 & x[i]>0.5) {
            y[i] <- 0;
        } else {
            y[i] <- yprev
        }
        yprev
        <-
            y[i];
    }
    y
}
set.seed(1)
mean(replicate(10, system.time(encode.1(runif(100000)), TRUE)[[3]]))
[1] 0.7256
set.seed(1)
mean(replicate(10, system.time(encode.2(runif(100000)), TRUE)[[3]]))
[1] 0.0983
mean(replicate(10, system.time(encode.1(integer(100000)), TRUE)[[3]]))
[1] 0.6288
mean(replicate(10, system.time(encode.2(integer(100000)), TRUE)[[3]]))
[1] 0.0338

and, modulo x==.5, even seems to produce the same result ;)
set.seed(1); res.1 <- replicate(10, encode.1(runif(1000)))
set.seed(1); res.2 <- replicate(10, encode.2(runif(1000)))
identical(res.1, res.2)
[1] TRUE

Algorithm f7 of

   https://stat.ethz.ch/pipermail/r-devel/2008-April/049111.html

might point to fast (comparable to C) R implementations.

Martin
this small part to C, it's very simple and may yield a considerable
performance boost (untested). It can probably be used as a textbook
example for code where porting to C make sense.

Best,
Uwe Ligges

#
-----------------------------------------------------------------------------------------
set.seed(1)
x <- runif(100)
n <- length( x )
y <- rep(NA, n)
yprev <- 0;
for ( i in (1:n)) {
	if ( x[i]>0.75 ) {
		y[i] <- 1;
	} else if ( x[i]<0.25 ) {
		y[i] <- -1;
	} else if ( yprev==1 & x[i]<0.5) {
		y[i] <- 0;
	} else if ( yprev==-1 & x[i]>0.5) {
		y[i] <- 0;
	} else {
		y[i] <- yprev
	}
	yprev <- y[i];
}

y
  [1]  0  0  0  1 -1  1  1  1  1 -1 -1 -1  0  0  1  0  0  1  0  1  1 -1  0
-1 -1
 [26] -1 -1 -1  1  0  0  0  0 -1  1  1  1 -1  0  0  1  1  1  1  1  1
-1 -1 
0  0
 [51]  0  1  0 -1 -1 -1 -1  0  0  0  1  0  0  0  0  0  0  1 -1  1  0
1  0 0  0
 [76]  1  1  0  1  1  0  0  0  0  1 -1  0 -1 -1 -1 -1 -1  0  1  1  1
0  0 1  1

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

How to get rid of loop?

Thread (8 messages)