Message-ID: <etPan.57b5e73f.13eb676b.5ef6@birc.au.dk>
Date: 2016-08-18T16:50:07Z
From: Thomas Mailund
Subject: optimize the filling of a diagonal matrix (two for loops)
In-Reply-To: <20160818173849.Horde.msvc0kYaV0o-z3SATb8slQ7@webmail.um.es>
?
The nested for-loops could very easily be moved to Rcpp which should speed them up. Using apply functions instead of for-loops will not make it faster; they still have to do the same looping.
At least, when I use `outer` to replace the loop I get roughly the same speed for the two versions ? although the `outer` solution does iterate over the entire matrix and not just the upper-triangular matrix.
library(stringdist) # I don?t have TSmining library installed so I tested with this instead
for_loop_test <- function() {
? matrixPrepared <- matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
? for (i in 1:(nrow(dataS)-1)){
? ? for (j in (1+i):nrow(dataS)){
? ? ? matrixPrepared[i, j] <- stringdist(paste0(as.character(dataS[i,]), collapse=""),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?paste0(as.character(dataS[j,]), collapse=""))
? ? }
? }
? matrixPrepared
}
apply_test <- function() {
? get_dist <- function(i, j) {
? ? if (i <= j) NA
? ? else stringdist(paste0(as.character(dataS[i,]), collapse=""),
? ? ? ? ? ? ? ? ? ? paste0(as.character(dataS[j,]), collapse=""))
? }
? get_dist <- Vectorize(get_dist)
? t(outer(1:nrow(dataS), 1:nrow(dataS), get_dist))
}
library(microbenchmark)
equivalent <- function(x, y) (is.na(x) && is.na(y)) || (x == y)
check <- function(values) all(equivalent(values[[1]], values[[2]]))
microbenchmark(for_loop_test(), apply_test(), check = check, times = 5)
Cheers
Thomas
On 18 August 2016 at 17:41:01, AURORA GONZALEZ VIDAL (aurora.gonzalez2 at um.es(mailto:aurora.gonzalez2 at um.es)) wrote:
> Hello
>
> I have two for loops that I am trying to optimize... I looked for
> vectorization or for using some funcions of the apply family but really
> cannot do it. I am writting my code with some small data set. With this
> size there is no problem but sometimes I will have hundreds of rows so it
> is really important to optimize the code. Any suggestion will be very
> welcomed.
>
> library("TSMining")
> dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T),
> V2 = sample(c(1,2,3,4),30,replace =
> T),
> V3 = sample(c(1,2,3,4),30,replace =
> T),
> V4 = sample(c(1,2,3,4),30,replace =
> T))
> saxM = Func.matrix(5)
> colnames(saxM) = 1:5
> rownames(saxM) = 1:5
> matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
>
> FOR(I IN 1:(NROW(DATAS)-1)){
> FOR(J IN (1+I):NROW(DATAS)){
> MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]),
> AS.CHARACTER(DATAS[J,]), SAXM, N=60)
> }
> }
> matrixPrepared
>
> Thank you!
>
>
> ------
> Aurora Gonz?lez Vidal
> Phd student in Data Analytics for Energy Efficiency
>
> Faculty of Computer Sciences
> University of Murcia
>
> @. aurora.gonzalez2 at um.es
> T. 868 88 7866
> www.um.es/ae
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.