Speeding up a loop

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120720/a66ada63/attachment.pl>
That is faster than what I was doing and reducing 15% of my iterations it
still very helpful.

Next question.

I need to multiply each row x[i,] of the matrix x by another matrix A.
Specifically

for(i in 1:n)
{
If (x[i,]%*%A[,1]<.5 || x[i,]%*%A[,2]<42 || x[i,]%*%A[,3]>150) 
{
x<-x[-i,] 
n<-n-1
}. #In other words remove row i from x if it does not meet criteria (>=.5,
=42, <=150). When multiplied to A
}
Is there a better way than using a for loop for this or x<-x[-i,] for that
matter? I assume building a new matrix would be worse. 

Ideally I want to also exclude some x[,i] as well example if x[1,] is better
than x[2,] in all three categories i.e. bigger, bigger, and smaller than
x[2,] when multiplied to A then I want to exclude x[2,] as well. Any
suggestions on whether it is better to do this all at once or in stages?

Thanks for helping!

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637255.html
Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120720/117e5411/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120720/2e2cdec5/attachment.pl>
next for loop question.

I need a loop that removes a row from a matrix if it is worse in positions
1,2,3,4 than another row in the matrix. right now my matrix is 503028x26.

Rule to define worse position1 is smaller, position2 is smaller, position3
is higher, and position4 is smaller

Example: 

row1: 1, 10, 3, 3
row2: 3, 7, 5, 2

row2 is not worse than row1 since it is "better" in position 1, eventhough
it is worse in all other positions.

row3: 2,5,7,1
row3 however is worse than row2 and should be removed from the matrix.

Any ideas? Should I break this into pieces or do it all at once? Is there
something faster than a loop? My current loops takes well over 24 hours to
run.

m<-matrix(0,1,24)
for(i in 1:n)
{
 a<-matrix(x[i,1:4],1,4)
j=1
      nn<-nrow(m)
      counter<-0
      while(j<=nn)
      {
        if(a[1]>m[j,1] && a[2]>m[j,2] && a[3]>m[j,4] && a[4]<m[j,4])
        {
          m&lt;-m[-j,]
          nn&lt;-length(m[,1])
          counter&lt;-1
        } else j&lt;-j+1
      }
      if(counter==1)
      {
        b&lt;-cbind(a,x)
         m&lt;-rbind(m,b)
      }
      if(counter==0)
      {
        if(a[1]>min(m[,1]) || a[3]>min(m[,3]) || a[4]>min(m[,4]) ||
a[5]<max(m[,5]))
        {
          b<-cbind(a,x)
           m<-rbind(m,b)
        }
     }
}

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637305.html
Sent from the R help mailing list archive at Nabble.com.
Hello,

Maybe it would have been better to start a new thread, if the question 
is different. To show that it's a continuation, the subject line could 
be extended with "part 2" or something like that.

This solution runs in 3.6 hours.

to.keep <- function(x){
     keep <- function(i, env){
         env$ires <- env$ires + 1
         if(env$ires > env$curr.rows){
             env$result <- rbind(env$result, matrix(nrow=increment, 
ncol=nc))
             env$curr.rows <- env$curr.rows + increment
         }
         env$result[env$ires, ] <- x[i, ]
     }

     a1 <- x[, 1]
     a2 <- x[, 2]
     a3 <- x[, 3]
     a4 <- x[, 4]
     nc <- ncol(x)
     increment <- 1000

     e <- new.env()
     e$curr.rows <- increment
     e$result <- matrix(nrow=e$curr.rows, ncol=nc)
     e$ires <- 0

     for(i in seq_len(nrow(x))){
         yes <- x[i, 1] >= a1 | x[i, 2] >= a2 | x[i, 3] <= a3 | x[i, 4] 
 >= a4
         if(all(yes)) keep(i, e)
     }
     e$result[seq_len(e$ires), 1:nc]
}

# Now the timing.

set.seed(3971)
nc <- 26
Enes <- seq(from=1e3, to=1e4, by=1e3)
tm <- numeric(length(Enes))
i <- 0
for(n in Enes){
     i <- i + 1
     N <- nc*n
     m <- matrix(sample(0:9, N, TRUE), ncol=nc)
     tm[i] <- system.time(kp <- to.keep(m))[3]
}

plot(Enes, tm) # quadratic behavior
fit <- lm(tm ~ Enes + I(Enes^2))
(secs <- predict(fit, newdata=data.frame(Enes=503028)))
secs/60/60 # 3.6 hours

Hope this helps,

Rui Barradas

Em 21-07-2012 13:26, wwreith escreveu:
next for loop question.

I need a loop that removes a row from a matrix if it is worse in positions
1,2,3,4 than another row in the matrix. right now my matrix is 503028x26.

Rule to define worse position1 is smaller, position2 is smaller, position3
is higher, and position4 is smaller

Example:

row1: 1, 10, 3, 3
row2: 3, 7, 5, 2

row2 is not worse than row1 since it is "better" in position 1, eventhough
it is worse in all other positions.

row3: 2,5,7,1
row3 however is worse than row2 and should be removed from the matrix.

Any ideas? Should I break this into pieces or do it all at once? Is there
something faster than a loop? My current loops takes well over 24 hours to
run.

m<-matrix(0,1,24)
for(i in 1:n)
{
  a<-matrix(x[i,1:4],1,4)
j=1
       nn<-nrow(m)
       counter<-0
       while(j<=nn)
       {
         if(a[1]>m[j,1] && a[2]>m[j,2] && a[3]>m[j,4] && a[4]<m[j,4])
         {
           m&lt;-m[-j,]
           nn&lt;-length(m[,1])
           counter&lt;-1
         } else j&lt;-j+1
       }
       if(counter==1)
       {
         b&lt;-cbind(a,x)
          m&lt;-rbind(m,b)
       }
       if(counter==0)
       {
         if(a[1]>min(m[,1]) || a[3]>min(m[,3]) || a[4]>min(m[,4]) ||
a[5]<max(m[,5]))
         {
           b<-cbind(a,x)
            m<-rbind(m,b)
         }
      }
}

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637305.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Any chance I could ask for an idiots guide for function to.keep(x). I
understand how to use it but not what some of the lines are doing. Comments
would be extremely helpful.

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html
Sent from the R help mailing list archive at Nabble.com.
Ok, sorry, I should have included some comments.

The function is divided in three parts, 1. intro, 2. decision, 3. keep rows.
Part 3 is the function keep(), internal to to.keep(). Let's start with 1.

1. Setup some variables first.
1.a) The variables 'a'.
If the input object 'x' is a matrix this doesn't give a great speed-up 
but if 'x' is a data.frame, extraction is time consuming.
So, do this once only, at the beginning.
1.b) The new environment.
This is because my first version would need to change values declared 
outside the internal function.
This can be done with the global assignment operator, <<-, but this 
pratice should be avoided, it's easy to mess things up.
Note that all the variables changed inside the internal function are in 
this new environment, 'e'.
In particular note that 'result' is initialized with 1000 rows.
2. The loop.
This is where we decide if we want to keep that row. I have negated the 
condition from an original 'no'.
The 'no' condition:
     a1[i] < a1 & a2[i] < a2 & a3[i] > a3 & a4[i] < a4
Then the test would be:
     if(any(no)) dont_keep else keep.  # pseudo-code
Not in pseudo-code:
     if( all( !no ) ) keep(i, e)
The down side of this is that the original is more readable.

3. The internal function, keep().
Considering the small number of rows I have used for tests, e$result was 
initialized to 1e3.
With 5e5 lines I would increase this number to 1e5.
First, the funcion updates the [row number] pointer into 'result' and 
checks if we are at a 'result' limit.
If yes, make it bigger by e$increment [ == 1e3 ] rows.
Then just assign row i from matrix/df 'x' to the appropriate row of 
e$result.
The reason why we need the environment is because on function return, 
all but the returned value is lost.
We could return a list with saved values of ires, curr.rows, result, and 
return the list.
But this would complicate and slow things down. Assign, update and 
reassign. Messy.
Environments can help keep it "simple", in the sense of to keep together 
what is meant to be used together.

And now I hope there is not an overdose of comments :)

Rui Barradas

Em 21-07-2012 18:37, wwreith escreveu:
Any chance I could ask for an idiots guide for function to.keep(x). I
understand how to use it but not what some of the lines are doing. Comments
would be extremely helpful.

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
1.15           60	0.553555415         0.574892872
1.15	   60	0.563183983         0.564029359

Shouldn't the function row out the second one, since it it higher in
position 3 and lower in position 4 i.e. it should not all be yes?

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.html
Sent from the R help mailing list archive at Nabble.com.
Hello,

I think this is a boundary issue. In your op you've said "less" not 
"less than or equal to".
Try using "<=" and ">=" to see what happens, I bet it solves it.

Rui Barradas

Em 23-07-2012 14:43, wwreith escreveu:
1.15           60	0.553555415         0.574892872
1.15	   60	0.563183983         0.564029359

Shouldn't the function row out the second one, since it it higher in
position 3 and lower in position 4 i.e. it should not all be yes?

--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.