Skip to content
Prev 14181 / 15274 Next

Parallelizing applyStrategy to multiple symbols

Hi Atakan,

I use a batch file to run most of my R programs. That way I just have to get
it right once and then I can run it many times. The following is a simple
batch command, scatter_plot.bat,  to run some regressions:

"C:\Program Files\R\R-3.0.2\bin\x64\R.exe" CMD BATCH  " Scatter_Plot.txt" "
Scatter_Plot.out"

Scatter_plot.txt contains generic R commands that use data in the current
directory. Scatter_Plot.out will contain the output from the commands in the
text file. 

If I'm analyzing SPY data for 2016, I would use a data structure like:

\SPY\2016\01
\SPY\2016\02
\SPY\2016\03
.
.
.

So that I can analyze one month's data and save the output in one directory.
January data and output to \SPY\2016\01, etc. I have 8 execution paths and
can run 8 months of data simultaneously.  My program is small and does not
use up all available physical memory. I would run the final 4 months when 4
of the 8 initial months are finished. 

If I run more than 8 data intensive regressions, what Brian is saying is
that the OS will spend extra time allocating which thread from which process
gets loaded into the next available execution path. If I were to use up more
than the available physical memory, if that thread was swapped out to the
disk, the process would need to be loaded back into memory and executed
while some process in memory would have to be swapped out to the hard drive.
This traffic will slow things down dramatically. 

At the end of the batch file, the output is copied up one directory, in this
case to 2016, with the year and month appended to a generic file name. There
is a batch file in 2016 to concatenate all data from the different months
into one file for 2016.

Best,

Frank
Chicago
 


-----Original Message-----
From: Atakan Okan [mailto:atakanokan at outlook.com] 
Sent: Monday, March 06, 2017 4:37 PM
To: Frank <frankm60606 at gmail.com>
Cc: Brian G. Peterson <brian at braverock.com>; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Parallelizing applyStrategy to multiple symbols

Hi Frank,

I just thought of an idea based on your suggestion. Instead of trying to
implement a foreach loop, I will try to subset my symbol set into different
R sessions with the create a new r session option in Rstudio and then run
each subset on a different session with the default call to applyStrategy. I
think this is what you were suggesting or I might have understood it
incorrectly. 

Hi Brian,
 My understanding of parallelization wasnt enough to grasp all of your
reply, but I am not planning on doing rebalancing or testing any strategy
that need to "talk" to other threads. Each symbol is backtested on its own
withiut any input or output to and from other symbols' backtest. Would my
idea suggested above work in this case? I think I explained my problem
inadequately; the time of completion of a single symbol's backtest is not
the issue but the sequential computing of each symbol's backtest and
consequently, linearly increasing completion time of all symbols' backtest
is the main issue. I just want to divide each symbol's applyStrategy call to
each CPU my laptop has to speed up the process. Like apply.paramset but not
for each parameter combination, for each symbol. I hope I have explained
better. 

Thanks for the help.

Best,

Atakan Okan
or not?
it suggests it does for multiple cores and I would assume hyper threading.
have an i7 that hyper threads which pegs the CPU at 100%. If you had a
similar setup, you could break your 100 symbol list down into 8 datasets and
run them simultaneously.
throughput.
time and resource contention.
the potential benefit from parallelization will likely be negative, as
communication and memory management swap any benefit from the calculations.
back together on any rebalancing period.  You would also have significant
copying time.
could do this for the end of the calculation, but at the start, you'd need
to be smarter about how you segment market data to each worker.
issues.  You also don't need to redeclare the strategy object.  You could
just copy that to each worker.
each segment, and try to avoid as many copies as we can.
this approach seems too simplistic (see my first sentence for hints).
work we can usually get to around one core minute per symbol per day on L1
tick data, which means that even a large backtest on tick data can finish in
a few hours.  The cost of optimizing execution time doesn't seem to be worth
the cost in programming and testing time.
should go.