An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120720/a66ada63/attachment.pl>
Speeding up a loop
10 messages · Jean V Adams, Richard M. Heiberger, wwreith +1 more
That is faster than what I was doing and reducing 15% of my iterations it
still very helpful.
Next question.
I need to multiply each row x[i,] of the matrix x by another matrix A.
Specifically
for(i in 1:n)
{
If (x[i,]%*%A[,1]<.5 || x[i,]%*%A[,2]<42 || x[i,]%*%A[,3]>150)
{
x<-x[-i,]
n<-n-1
}. #In other words remove row i from x if it does not meet criteria (>=.5,
=42, <=150). When multiplied to A
} Is there a better way than using a for loop for this or x<-x[-i,] for that matter? I assume building a new matrix would be worse. Ideally I want to also exclude some x[,i] as well example if x[1,] is better than x[2,] in all three categories i.e. bigger, bigger, and smaller than x[2,] when multiplied to A then I want to exclude x[2,] as well. Any suggestions on whether it is better to do this all at once or in stages? Thanks for helping! -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637255.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120720/117e5411/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120720/2e2cdec5/attachment.pl>
next for loop question.
I need a loop that removes a row from a matrix if it is worse in positions
1,2,3,4 than another row in the matrix. right now my matrix is 503028x26.
Rule to define worse position1 is smaller, position2 is smaller, position3
is higher, and position4 is smaller
Example:
row1: 1, 10, 3, 3
row2: 3, 7, 5, 2
row2 is not worse than row1 since it is "better" in position 1, eventhough
it is worse in all other positions.
row3: 2,5,7,1
row3 however is worse than row2 and should be removed from the matrix.
Any ideas? Should I break this into pieces or do it all at once? Is there
something faster than a loop? My current loops takes well over 24 hours to
run.
m<-matrix(0,1,24)
for(i in 1:n)
{
a<-matrix(x[i,1:4],1,4)
j=1
nn<-nrow(m)
counter<-0
while(j<=nn)
{
if(a[1]>m[j,1] && a[2]>m[j,2] && a[3]>m[j,4] && a[4]<m[j,4])
{
m<-m[-j,]
nn<-length(m[,1])
counter<-1
} else j<-j+1
}
if(counter==1)
{
b<-cbind(a,x)
m<-rbind(m,b)
}
if(counter==0)
{
if(a[1]>min(m[,1]) || a[3]>min(m[,3]) || a[4]>min(m[,4]) ||
a[5]<max(m[,5]))
{
b<-cbind(a,x)
m<-rbind(m,b)
}
}
}
--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637305.html
Sent from the R help mailing list archive at Nabble.com.
Hello,
Maybe it would have been better to start a new thread, if the question
is different. To show that it's a continuation, the subject line could
be extended with "part 2" or something like that.
This solution runs in 3.6 hours.
to.keep <- function(x){
keep <- function(i, env){
env$ires <- env$ires + 1
if(env$ires > env$curr.rows){
env$result <- rbind(env$result, matrix(nrow=increment,
ncol=nc))
env$curr.rows <- env$curr.rows + increment
}
env$result[env$ires, ] <- x[i, ]
}
a1 <- x[, 1]
a2 <- x[, 2]
a3 <- x[, 3]
a4 <- x[, 4]
nc <- ncol(x)
increment <- 1000
e <- new.env()
e$curr.rows <- increment
e$result <- matrix(nrow=e$curr.rows, ncol=nc)
e$ires <- 0
for(i in seq_len(nrow(x))){
yes <- x[i, 1] >= a1 | x[i, 2] >= a2 | x[i, 3] <= a3 | x[i, 4]
>= a4
if(all(yes)) keep(i, e)
}
e$result[seq_len(e$ires), 1:nc]
}
# Now the timing.
set.seed(3971)
nc <- 26
Enes <- seq(from=1e3, to=1e4, by=1e3)
tm <- numeric(length(Enes))
i <- 0
for(n in Enes){
i <- i + 1
N <- nc*n
m <- matrix(sample(0:9, N, TRUE), ncol=nc)
tm[i] <- system.time(kp <- to.keep(m))[3]
}
plot(Enes, tm) # quadratic behavior
fit <- lm(tm ~ Enes + I(Enes^2))
(secs <- predict(fit, newdata=data.frame(Enes=503028)))
secs/60/60 # 3.6 hours
Hope this helps,
Rui Barradas
Em 21-07-2012 13:26, wwreith escreveu:
next for loop question.
I need a loop that removes a row from a matrix if it is worse in positions
1,2,3,4 than another row in the matrix. right now my matrix is 503028x26.
Rule to define worse position1 is smaller, position2 is smaller, position3
is higher, and position4 is smaller
Example:
row1: 1, 10, 3, 3
row2: 3, 7, 5, 2
row2 is not worse than row1 since it is "better" in position 1, eventhough
it is worse in all other positions.
row3: 2,5,7,1
row3 however is worse than row2 and should be removed from the matrix.
Any ideas? Should I break this into pieces or do it all at once? Is there
something faster than a loop? My current loops takes well over 24 hours to
run.
m<-matrix(0,1,24)
for(i in 1:n)
{
a<-matrix(x[i,1:4],1,4)
j=1
nn<-nrow(m)
counter<-0
while(j<=nn)
{
if(a[1]>m[j,1] && a[2]>m[j,2] && a[3]>m[j,4] && a[4]<m[j,4])
{
m<-m[-j,]
nn<-length(m[,1])
counter<-1
} else j<-j+1
}
if(counter==1)
{
b<-cbind(a,x)
m<-rbind(m,b)
}
if(counter==0)
{
if(a[1]>min(m[,1]) || a[3]>min(m[,3]) || a[4]>min(m[,4]) ||
a[5]<max(m[,5]))
{
b<-cbind(a,x)
m<-rbind(m,b)
}
}
}
--
View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637305.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Any chance I could ask for an idiots guide for function to.keep(x). I understand how to use it but not what some of the lines are doing. Comments would be extremely helpful. -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html Sent from the R help mailing list archive at Nabble.com.
Ok, sorry, I should have included some comments.
The function is divided in three parts, 1. intro, 2. decision, 3. keep rows.
Part 3 is the function keep(), internal to to.keep(). Let's start with 1.
1. Setup some variables first.
1.a) The variables 'a'.
If the input object 'x' is a matrix this doesn't give a great speed-up
but if 'x' is a data.frame, extraction is time consuming.
So, do this once only, at the beginning.
1.b) The new environment.
This is because my first version would need to change values declared
outside the internal function.
This can be done with the global assignment operator, <<-, but this
pratice should be avoided, it's easy to mess things up.
Note that all the variables changed inside the internal function are in
this new environment, 'e'.
In particular note that 'result' is initialized with 1000 rows.
2. The loop.
This is where we decide if we want to keep that row. I have negated the
condition from an original 'no'.
The 'no' condition:
a1[i] < a1 & a2[i] < a2 & a3[i] > a3 & a4[i] < a4
Then the test would be:
if(any(no)) dont_keep else keep. # pseudo-code
Not in pseudo-code:
if( all( !no ) ) keep(i, e)
The down side of this is that the original is more readable.
3. The internal function, keep().
Considering the small number of rows I have used for tests, e$result was
initialized to 1e3.
With 5e5 lines I would increase this number to 1e5.
First, the funcion updates the [row number] pointer into 'result' and
checks if we are at a 'result' limit.
If yes, make it bigger by e$increment [ == 1e3 ] rows.
Then just assign row i from matrix/df 'x' to the appropriate row of
e$result.
The reason why we need the environment is because on function return,
all but the returned value is lost.
We could return a list with saved values of ires, curr.rows, result, and
return the list.
But this would complicate and slow things down. Assign, update and
reassign. Messy.
Environments can help keep it "simple", in the sense of to keep together
what is meant to be used together.
And now I hope there is not an overdose of comments :)
Rui Barradas
Em 21-07-2012 18:37, wwreith escreveu:
Any chance I could ask for an idiots guide for function to.keep(x). I understand how to use it but not what some of the lines are doing. Comments would be extremely helpful. -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
1.15 60 0.553555415 0.574892872 1.15 60 0.563183983 0.564029359 Shouldn't the function row out the second one, since it it higher in position 3 and lower in position 4 i.e. it should not all be yes? -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.html Sent from the R help mailing list archive at Nabble.com.
Hello, I think this is a boundary issue. In your op you've said "less" not "less than or equal to". Try using "<=" and ">=" to see what happens, I bet it solves it. Rui Barradas Em 23-07-2012 14:43, wwreith escreveu:
1.15 60 0.553555415 0.574892872 1.15 60 0.563183983 0.564029359 Shouldn't the function row out the second one, since it it higher in position 3 and lower in position 4 i.e. it should not all be yes? -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.