Hello
I have two for loops that I am trying to optimize... I looked for
vectorization or for using some funcions of the apply family? but really
cannot do it. I am writting my code with some small data set. With this
size there is no problem but sometimes I will have hundreds of rows so it
is really important to optimize the code. Any suggestion will be very
welcomed.
library("TSMining")
dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T),
?????????????????? V2 = sample(c(1,2,3,4),30,replace =
T),
?????????????????? V3 = sample(c(1,2,3,4),30,replace =
T),
?????????????????? V4 = sample(c(1,2,3,4),30,replace =
T))
saxM = Func.matrix(5)
colnames(saxM) = 1:5
rownames(saxM) = 1:5
matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
FOR(I IN 1:(NROW(DATAS)-1)){
? FOR(J IN (1+I):NROW(DATAS)){
??? MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]),
AS.CHARACTER(DATAS[J,]), SAXM, N=60)
? }
}
matrixPrepared
Thank you!
------
Aurora Gonz?lez Vidal
Phd student in Data Analytics for Energy Efficiency
Faculty of Computer Sciences
University of Murcia
@. aurora.gonzalez2 at um.es
T. 868 88 7866
www.um.es/ae
optimize the filling of a diagonal matrix (two for loops)
2 messages · AURORA GONZALEZ VIDAL, Thomas Mailund
?
The nested for-loops could very easily be moved to Rcpp which should speed them up. Using apply functions instead of for-loops will not make it faster; they still have to do the same looping.
At least, when I use `outer` to replace the loop I get roughly the same speed for the two versions ? although the `outer` solution does iterate over the entire matrix and not just the upper-triangular matrix.
library(stringdist) # I don?t have TSmining library installed so I tested with this instead
for_loop_test <- function() {
? matrixPrepared <- matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
? for (i in 1:(nrow(dataS)-1)){
? ? for (j in (1+i):nrow(dataS)){
? ? ? matrixPrepared[i, j] <- stringdist(paste0(as.character(dataS[i,]), collapse=""),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?paste0(as.character(dataS[j,]), collapse=""))
? ? }
? }
? matrixPrepared
}
apply_test <- function() {
? get_dist <- function(i, j) {
? ? if (i <= j) NA
? ? else stringdist(paste0(as.character(dataS[i,]), collapse=""),
? ? ? ? ? ? ? ? ? ? paste0(as.character(dataS[j,]), collapse=""))
? }
? get_dist <- Vectorize(get_dist)
? t(outer(1:nrow(dataS), 1:nrow(dataS), get_dist))
}
library(microbenchmark)
equivalent <- function(x, y) (is.na(x) && is.na(y)) || (x == y)
check <- function(values) all(equivalent(values[[1]], values[[2]]))
microbenchmark(for_loop_test(), apply_test(), check = check, times = 5)
Cheers
Thomas
On 18 August 2016 at 17:41:01, AURORA GONZALEZ VIDAL (aurora.gonzalez2 at um.es(mailto:aurora.gonzalez2 at um.es)) wrote:
Hello
I have two for loops that I am trying to optimize... I looked for
vectorization or for using some funcions of the apply family but really
cannot do it. I am writting my code with some small data set. With this
size there is no problem but sometimes I will have hundreds of rows so it
is really important to optimize the code. Any suggestion will be very
welcomed.
library("TSMining")
dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T),
V2 = sample(c(1,2,3,4),30,replace =
T),
V3 = sample(c(1,2,3,4),30,replace =
T),
V4 = sample(c(1,2,3,4),30,replace =
T))
saxM = Func.matrix(5)
colnames(saxM) = 1:5
rownames(saxM) = 1:5
matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
FOR(I IN 1:(NROW(DATAS)-1)){
FOR(J IN (1+I):NROW(DATAS)){
MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]),
AS.CHARACTER(DATAS[J,]), SAXM, N=60)
}
}
matrixPrepared
Thank you!
------
Aurora Gonz?lez Vidal
Phd student in Data Analytics for Energy Efficiency
Faculty of Computer Sciences
University of Murcia
@. aurora.gonzalez2 at um.es
T. 868 88 7866
www.um.es/ae
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.