Parallelizing applyStrategy to multiple symbols

Hi Atakan,

I use a batch file to run most of my R programs. That way I just have to get
it right once and then I can run it many times. The following is a simple
batch command, scatter_plot.bat,  to run some regressions:

"C:\Program Files\R\R-3.0.2\bin\x64\R.exe" CMD BATCH  " Scatter_Plot.txt" "
Scatter_Plot.out"

Scatter_plot.txt contains generic R commands that use data in the current
directory. Scatter_Plot.out will contain the output from the commands in the
text file. 

If I'm analyzing SPY data for 2016, I would use a data structure like:

\SPY\2016\01
\SPY\2016\02
\SPY\2016\03
.
.
.

So that I can analyze one month's data and save the output in one directory.
January data and output to \SPY\2016\01, etc. I have 8 execution paths and
can run 8 months of data simultaneously.  My program is small and does not
use up all available physical memory. I would run the final 4 months when 4
of the 8 initial months are finished. 

If I run more than 8 data intensive regressions, what Brian is saying is
that the OS will spend extra time allocating which thread from which process
gets loaded into the next available execution path. If I were to use up more
than the available physical memory, if that thread was swapped out to the
disk, the process would need to be loaded back into memory and executed
while some process in memory would have to be swapped out to the hard drive.
This traffic will slow things down dramatically. 

At the end of the batch file, the output is copied up one directory, in this
case to 2016, with the year and month appended to a generic file name. There
is a batch file in 2016 to concatenate all data from the different months
into one file for 2016.

Best,

Frank
Chicago

-----Original Message-----
From: Atakan Okan [mailto:atakanokan at outlook.com] 
Sent: Monday, March 06, 2017 4:37 PM
To: Frank <frankm60606 at gmail.com>
Cc: Brian G. Peterson <brian at braverock.com>; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Parallelizing applyStrategy to multiple symbols

Hi Frank,

I just thought of an idea based on your suggestion. Instead of trying to
implement a foreach loop, I will try to subset my symbol set into different
R sessions with the create a new r session option in Rstudio and then run
each subset on a different session with the default call to applyStrategy. I
think this is what you were suggesting or I might have understood it
incorrectly. 

Hi Brian,
 My understanding of parallelization wasnt enough to grasp all of your
reply, but I am not planning on doing rebalancing or testing any strategy
that need to "talk" to other threads. Each symbol is backtested on its own
withiut any input or output to and from other symbols' backtest. Would my
idea suggested above work in this case? I think I explained my problem
inadequately; the time of completion of a single symbol's backtest is not
the issue but the sequential computing of each symbol's backtest and
consequently, linearly increasing completion time of all symbols' backtest
is the main issue. I just want to divide each symbol's applyStrategy call to
each CPU my laptop has to speed up the process. Like apply.paramset but not
for each parameter combination, for each symbol. I hope I have explained
better. 

Thanks for the help.

Best,

Atakan Okan

Parallelizing applyStrategy to multiple symbols

Thread (5 messages)