How to get rid of loop? - R-help

Ken-JP

Fri, Apr 24, 2009 5:16 PM #

set.seed(1)
x <- runif(100)

# I want to calculate y such that:
#
# 1. if x>0.75, y <- 1
# 2. else if x<0.25, y <- -1
# 3. else if y_prev==1 && x<0.5, y <- 0
# 4. else if y_prev==-1 && x>0.5, y <- 0
# 5. else y <- y_prev
#
# 1. and 2. are directly doable without looping.
#
# How do I do 3.-5. without looping?  The problem is, I need to run this
algorithm over gigs of data, so I
# need to avoid looping, if at all possible...
#
# - Ken

View this message in context: http://www.nabble.com/How-to-get-rid-of-loop--tp23226779p23226779.html
Sent from the R help mailing list archive at Nabble.com.

Uwe Ligges

Sun, Apr 26, 2009 7:45 AM #

Ken-JP wrote:

If y_prev is meant to be from a former iteration of a loop, you probably 
can't get rid of it. Original working code might have helped to 
udnertsand your problem better.
Anyway, perhaps you can imnprove your loop in other ways, but again, 
we'd need to see at least some code ....

Uwe Ligges

Peter Alspach

Sun, Apr 26, 2009 1:50 PM #

Tena koe Ken

Would something along the following lines do what you require:

set.seed(1)
x <- runif(100)
y <- rep(NA, length(x))
y[x<0.25] <- -1
y[x>0.75] <- 1
y[-1][y[-length(y)]%in%1 & (x[-1]>=0.25 & x[-1]<0.5)] <- 0
# etc

HTH ....

Peter Alspach

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be subject to legal privilege.
 If you are not the intended recipient you must not use, disseminate, distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received this
 e-mail in error, please notify the sender and delete all material pertaining to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

Ken-JP

Sun, Apr 26, 2009 5:10 PM #

The code below shows what I'm trying to get rid of.

If there is no way to get rid of the loop, I will try to use package( inline
).
I'm just curious as to whether there is a "vector way" of doing this
algorithm.

#
-----------------------------------------------------------------------------------------

set.seed(1)
x <- runif(100)
n <- length( x )
y <- rep(NA, n)
yprev <- 0;
for ( i in (1:n)) {
	if ( x[i]>0.75 ) {
		y[i] <- 1;
	} else if ( x[i]<0.25 ) {
		y[i] <- -1;
	} else if ( yprev==1 & x[i]<0.5) {
		y[i] <- 0;
	} else if ( yprev==-1 & x[i]>0.5) {
		y[i] <- 0;
	} else {
		y[i] <- yprev
	}
	yprev <- y[i];
}

[1]  0  0  0  1 -1  1  1  1  1 -1 -1 -1  0  0  1  0  0  1  0  1  1 -1  0
-1 -1
 [26] -1 -1 -1  1  0  0  0  0 -1  1  1  1 -1  0  0  1  1  1  1  1  1 -1 -1 
0  0
 [51]  0  1  0 -1 -1 -1 -1  0  0  0  1  0  0  0  0  0  0  1 -1  1  0  1  0 
0  0
 [76]  1  1  0  1  1  0  0  0  0  1 -1  0 -1 -1 -1 -1 -1  0  1  1  1  0  0 
1  1

View this message in context: http://www.nabble.com/How-to-get-rid-of-loop--tp23226779p23248206.html
Sent from the R help mailing list archive at Nabble.com.

Uwe Ligges

Mon, Apr 27, 2009 1:53 AM #

Ken-JP wrote:

I don't see a vector way. If speed is an issue, I'd suggest to port this 
small part to C, it's very simple and may yield a considerable 
performance boost (untested). It can probably be used as a textbook 
example for code where porting to C make sense.

Best,
Uwe Ligges

Ken-JP

Mon, Apr 27, 2009 3:45 AM #

Thanks, Uwe, Peter, and Ray, for taking the time to look into this.

Just to wrap up this thread, and so that others may benefit,
I tried writing both a R code version and an inline C version.

Tested on a 8GB Ubuntu 64amd box, R 2.81, the speed difference was:
104secs vs 0.534secs, or the C version was about 200 times faster.

#
-------------------------------------------------------------------------------------

require( inline )
require( RUnit )

set.seed(1)
x <- runif(1e+7)

.c.code <-
                " SEXP res, inp;
                int cnt = 0, i;
                PROTECT(inp = AS_NUMERIC(p));     cnt++;
                PROTECT(res = Rf_duplicate(inp)); cnt++;
                int nx = INTEGER(GET_DIM(inp))[0], ny =
INTEGER(GET_DIM(inp))[1];
                double* pdata = REAL(AS_NUMERIC(inp));
                double* rdata = REAL(res); double last;
                for (int y = 0; y < ny; y++) {
                        last = 0.0;
                        for (int x = 0; x < nx; x++) {
                                i = x + y*nx;
                                if ( pdata[i]>0.75 ) {
                                        rdata[i] = 1.0;
                                } else if ( pdata[i]<0.25 ) {
                                        rdata[i] = -1.0;
                                } else if ( last==1 && pdata[i]<0.5 ) {
                                        rdata[i] = 0.0;
                                } else if ( last==-1 && pdata[i]>0.5 ) {
                                        rdata[i] = 0.0;
                                } else {
                                        rdata[i] = last;
                                }
                                last = rdata[i];
                        }
                }

                UNPROTECT(cnt);
                return res;";
.c.code.raw <- cfunction(signature(p="matrix"), .c.code);
# NOTE: I converted to matrix because I actually want to do many columns,
one at a time
ccode <- function( x ) { as.vector( .c.code.raw( p=as.matrix( x ) )); }

rcode <- function( x ) {
        n <- length( x )
        y <- rep(NA, n)
        yprev <- 0;
        for ( i in (1:n)) {
                if ( x[i]>0.75 ) {
                        y[i] <- 1;
                } else if ( x[i]<0.25 ) {
                        y[i] <- -1;
                } else if ( yprev==1 & x[i]<0.5) {
                        y[i] <- 0;
                } else if ( yprev==-1 & x[i]>0.5) {
                        y[i] <- 0;
                } else {
                        y[i] <- yprev
                }
                yprev <- y[i];
        }
        y;
}

system.time( r.result <- rcode( x ));
system.time( c.result <- ccode( x ));
checkEquals( r.result, c.result );

View this message in context: http://www.nabble.com/How-to-get-rid-of-loop--tp23226779p23253747.html
Sent from the R help mailing list archive at Nabble.com.

Martin Morgan

Mon, Apr 27, 2009 11:53 AM #

Uwe Ligges <ligges at statistik.tu-dortmund.de> writes:

Parsing the code, it seems like a run of 1's starts with a value
greater than .75, and continues until a value less than .5. So this...

    x1 <- x > .75
    c(x1[[1]], x1[-1] & !x1[-n])

flags the start of each run, and this
   
    y1 <- lapply(split(x > .5, cumsum(c(x1[[1]], x1[-1] & !x1[-n]))),
                 cumprod)
                         
splits candidate runs x > .5 into subsets beginning at each start
point, and then processes the subset to extend the run as long as the
values are greater than .5. My 'vectorized' version, with an imperfect
solution to the first run and glossing over the point x==.5 in the
original code, is

encode.2 <- function(x) {
    n <- length(x)
    x0 <- x < .25
    y0 <- lapply(split(x <= .5, cumsum(c(x0[[1]], x0[-1] & !x0[-n]))),
                 cumprod)
    x1 <- x > .75
    y1 <- lapply(split(x > .5, cumsum(c(x1[[1]], x1[-1] & !x1[-n]))),
                 cumprod)
    as.vector((cumsum(abs(x-.5) > .25) != 0) *
              (-unlist(y0, use.names=FALSE) + unlist(y1, use.names=FALSE)))
}

this seems to be 7-20x faster than encode.1

encode.1 <- function(x) {
    n <- length( x )
    y <- rep(NA, n)
    yprev <- 0;
    for ( i in (1:n)) {
        if ( x[i]>0.75 ) {
            y[i] <- 1;
        } else if ( x[i]<0.25 ) {
            y[i] <- -1;
        } else
        if ( yprev==1 & x[i]<0.5) {
            y[i] <- 0;
        } else if ( yprev==-1 & x[i]>0.5) {
            y[i] <- 0;
        } else {
            y[i] <- yprev
        }
        yprev
        <-
            y[i];
    }
    y
}

[1] 0.7256

[1] 0.0983

[1] 0.6288

[1] 0.0338
 
and, modulo x==.5, even seems to produce the same result ;)

[1] TRUE

Algorithm f7 of

   https://stat.ethz.ch/pipermail/r-devel/2008-April/049111.html

might point to fast (comparable to C) R implementations.

Martin

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Bart Joosen

Tue, Apr 28, 2009 2:25 AM #

Hi,

how about this:
x <- runif(100) 
n <- length( x ) 

y2 <- rep(0,n)
y2[x > 0.75] <- 1
y2[x < 0.25] <- -1
cx <- cumsum(abs(y2) )
m <- match(cx, cx)
y2[y2==0] <- 2
y2[x<0.5 & y2[m]==1] <- 0
y2[x>0.5 & y2[m]==-1] <- 0
y3 <- y2
y3[y3==0] <- 1
y3[y3==2] <- 0
cx <- cumsum(abs(y3))
m <- match(cx, cx)
y2 <- y2[m]


Best regards

Bart

View this message in context: http://www.nabble.com/How-to-get-rid-of-loop--tp23226779p23273405.html
Sent from the R help mailing list archive at Nabble.com.