An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110409/0d60ab16/attachment.pl>
How do I make this faster?
3 messages · Hasan Diwan, Andreas Borg, Paul Hiemstra
2 days later
Hi Hasan, I'd be happy to help you, but I am not able to run your code. You use commandArgs to retrieve arguments of the R program, but which ones do you actually provide? Best regards, Andreas Hasan Diwan schrieb:
I was on vacation the last week and wrote some code to run a 500-day
correlation between the Nasdaq tracking stock (QQQ) and 191 currency pairs
for 500 days. The initial run took 9 hours(!) and I'd like to make it
faster. So, I'm including my code below, in hopes that somebody will be able
to figure out how to make it faster, either through parallelisation, or by
making changes. I've marked the places where Rprof showed me it was slowing
down:
currencyCorrelation <- function(lagtime = 1) {
require(quantmod)
dataTrack <- getSymbols(commandArgs(trailingOnly=T)[1], from='2009-11-21',
to='2011-04-03')
stockData <- get(dataTrack)
currencies <- row.names(oanda.currencies[grep(pattern='oz.', fixed=T, x
=as.vector(oanda.currencies$oanda.df.1.length.oanda.df...2....1.)) == F])
correlations <- vector()
values <- list()
# optimise these loops using the apply family
for (i in currencies) {
for (j in currencies) {
if (i == j) next()
fx <- getFX(paste(i, j, sep='/'), from='2009-11-20', to='2011-04-02')
# Prepare data by getting rates for market days only
fx <- get(fx)
fx <- fx[which(index(fx) %in% index(QQQ$QQQ.Close))]
correlation <- cor(fx, QQQ$QQQ.Close)
correlations <- c(correlations, correlation)
string <- paste(paste(i,j,sep='/'), correlation, sep=',')
values <- c(values,paste(string,'\n', sep=''))
}
}
# TODO eliminate NA's
values <- values[which(correlations[is.na(correlations) == F])]
correlations <- correlations[is.na(correlations) == F]
values <- values[order(correlations, decreasing=T)]
write.table(values, file=commandArgs(trailingOnly=T)[2], sep='',
qmethod=NULL, quote = F, row.names=F, col.names=F)
rm('currencies', 'correlations', 'values', 'fx', 'string')
return()
}
lagtime <- as.integer(commandArgs(trailingOnly=T)[3])
if (is.na(lagtime)) lagtime <- 1
print(paste(Sys.time(), '<--- starting', lagtime, 'day lag currencies
correlation with', commandArgs(trailingOnly=T)[1], 'from 2009-11-20 to
2011-04-03'))
currencyCorrelation(lagtime)
print(paste(Sys.time(), '<--- ended, results in',
commandArgs(trailingOnly=T)[2]))
Andreas Borg Medizinische Informatik UNIVERSIT?TSMEDIZIN der Johannes Gutenberg-Universit?t Institut f?r Medizinische Biometrie, Epidemiologie und Informatik Obere Zahlbacher Stra?e 69, 55131 Mainz www.imbei.uni-mainz.de Telefon +49 (0) 6131 175062 E-Mail: borg at imbei.uni-mainz.de Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und l?schen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
On 04/11/2011 10:28 AM, Andreas Borg wrote:
Hi Hasan, I'd be happy to help you, but I am not able to run your code. You use commandArgs to retrieve arguments of the R program, but which ones do you actually provide? Best regards, Andreas Hasan Diwan schrieb:
I was on vacation the last week and wrote some code to run a 500-day
correlation between the Nasdaq tracking stock (QQQ) and 191 currency
pairs
for 500 days. The initial run took 9 hours(!) and I'd like to make it
faster. So, I'm including my code below, in hopes that somebody will
be able
to figure out how to make it faster, either through parallelisation,
or by
making changes. I've marked the places where Rprof showed me it was
slowing
down:
currencyCorrelation <- function(lagtime = 1) {
require(quantmod)
dataTrack <- getSymbols(commandArgs(trailingOnly=T)[1],
from='2009-11-21',
to='2011-04-03')
stockData <- get(dataTrack)
currencies <- row.names(oanda.currencies[grep(pattern='oz.',
fixed=T, x
=as.vector(oanda.currencies$oanda.df.1.length.oanda.df...2....1.)) ==
F])
correlations <- vector()
values <- list()
# optimise these loops using the apply family
for (i in currencies) {
for (j in currencies) {
if (i == j) next()
fx <- getFX(paste(i, j, sep='/'), from='2009-11-20',
to='2011-04-02')
# Prepare data by getting rates for market days only
fx <- get(fx)
fx <- fx[which(index(fx) %in% index(QQQ$QQQ.Close))]
correlation <- cor(fx, QQQ$QQQ.Close)
correlations <- c(correlations, correlation)
In this piece of code you concatenate correlation and correlations. Because you dynamically change correllations the operating system is looking for a spot of memory for the object often. Preallocating the space you need, or a bit is also fine, will make this much faster. You can do this by not creating zero-length vectors for 'correlations' and 'vectors' before the start of the loop, but creating them already at the desired length and assign values in the loop, not concatenate. This could possibly speed up your codes by several orders of magnitude. cheers, Paul
string <- paste(paste(i,j,sep='/'), correlation, sep=',')
values <- c(values,paste(string,'\n', sep=''))
}
}
# TODO eliminate NA's
values <- values[which(correlations[is.na(correlations) == F])]
correlations <- correlations[is.na(correlations) == F]
values <- values[order(correlations, decreasing=T)]
write.table(values, file=commandArgs(trailingOnly=T)[2], sep='',
qmethod=NULL, quote = F, row.names=F, col.names=F)
rm('currencies', 'correlations', 'values', 'fx', 'string')
return()
}
lagtime <- as.integer(commandArgs(trailingOnly=T)[3])
if (is.na(lagtime)) lagtime <- 1
print(paste(Sys.time(), '<--- starting', lagtime, 'day lag currencies
correlation with', commandArgs(trailingOnly=T)[1], 'from 2009-11-20 to
2011-04-03'))
currencyCorrelation(lagtime)
print(paste(Sys.time(), '<--- ended, results in',
commandArgs(trailingOnly=T)[2]))
Paul Hiemstra, MSc Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770