I plan to try it out myself, but I wanted to check here if running applyStrategy in a loop, while looping over different dates, will work? I could not find any examples of this. There are 2 reasons for wanting to do this: First of all, one could have a couple of years of tick data, which is too big to fit in memory for each symbol. Of course, I am assuming that the orders placed by the strategy are sparse enough so that the order_book generated by applyStrategy can still fit in memory. The second reason is that if this loop could moreover be run in parallel, then there could potentially be a 500x speed up for two years of data.
Quantstrat - running applyStrategy in a loop
4 messages · Ilya Kipnis, Brian G. Peterson, James Hirschorn
So I'll let others correct me if I'm wrong, but the way I see it is so long as you can encapsulate your script in a function, you can wrap it inside a foreach loop, and return the results of each iteration, or just output various files as part of the instructions of said function. On Sun, Aug 19, 2018 at 5:16 PM, James Hirschorn <
james.hirschorn at quantitative-technologies.com> wrote:
I plan to try it out myself, but I wanted to check here if running
applyStrategy in a loop, while looping over different dates, will work? I
could not find any examples of this.
There are 2 reasons for wanting to do this: First of all, one could have a
couple of years of tick data, which is too big to fit in memory for each
symbol. Of course, I am assuming that the orders placed by the strategy are
sparse enough so that the order_book generated by applyStrategy can still
fit in memory.
The second reason is that if this loop could moreover be run in parallel,
then there could potentially be a 500x speed up for two years of data.
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
On Sun, 2018-08-19 at 17:16 -0400, James Hirschorn wrote:
I plan to try it out myself, but I wanted to check here if running applyStrategy in a loop, while looping over different dates, will work? I could not find any examples of this. There are 2 reasons for wanting to do this: First of all, one could have a couple of years of tick data, which is too big to fit in memory for each symbol. Of course, I am assuming that the orders placed by the strategy are sparse enough so that the order_book generated by applyStrategy can still fit in memory. The second reason is that if this loop could moreover be run in parallel, then there could potentially be a 500x speed up for two years of data.
James, The answer is 'it depends'. There is a parallel version of applyStrategy in the sandbox on github. I haven't touched it in several years, so I wouldn't trust that code. I mention it as an example of what is theoretically possible. A better example, which is already parallelized and much more highly utilized, is apply.paramset(). First, to expand on Ilya's answer, let's talk about what *is* possible. It is possible to wrap a foreach loop over applyStrategy that would separate symbols to different workers (though your hypothesized 500x speedup would require *at least* 500 worker nodes, spread out over several physical machines, using something like doRedis, which we have tested up to around 200 workers). This assumes that each symbol is completely independent, and that there is no interaction on things like trade sizing or capital or risk among the symbols. The simplest way to do this would be to create separate portfolios per symbol, so that each worker is completely independent. See examples of a different kind of splitting and parallelization in appply.paramset() (which is also used in walk forward testing). It is also possible, and we commonly do this, to segment the dates that you want to run applyStrategy over. As you hypothesized, a simple loop over date regions, loading different non-conflicting time series, may be applied to successively run each date range. This, as you noted, works well when even 64, 128, or 512GB+ of RAM is not enough for all of your data. We've made a number of changes over the years to make quantstrat more memory efficient, but copies are still made when unavoidable, state is kept between the various nested apply* functions, and RAM use basically grows throughout the run of a strategy evaluation. So segmenting the use of market data by Dates can help, though you may need to discard some intermediary results (like portions of the order book) to make everything fit. In the first example of parallelizing by symbol, RAM is your most likely issue still, since even very large machines rarely have more than about 16GB per core/thread. You still have some wrinkles here. Again, you need to assess whether there is any interaction. Transactions cannot be added to a portfolio out of order, as the P&L is (potentially) dependent on prior transactions. So you may again need to create multiple portfolios and stitch the different period P&L together yourself. So, in the 'don't do that' camp, don't try to apply transactions out of order, the trade blotter won't allow it. In the 'should work' camp are several variations of splitting your computational problem so that it is amendable to looping and/or parallelization, described above. Regards, Brian
Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock
1 day later
Thanks for the very detailed reply! I couldn't find a parallel applyStrategy in the sandbox. Is it still there, and if so what is the filename? In any case, if I understood correctly it should instead be modelled on apply.paramset(). You mentioned that you commonly segment the dates that you run applyStrategy over. If you have an example, could you please point it out? I will report back once I attempt a parallelization. Yes, I have run into problems with out of order transactions in two different situations. Once of them was when using the delay argument of ruleSignal, as you had suggested in an SO answer. But that is off topic for this thread... Regards, James On Mon, Aug 20, 2018 at 7:40 AM, Brian G. Peterson <brian at braverock.com> wrote:
On Sun, 2018-08-19 at 17:16 -0400, James Hirschorn wrote:
I plan to try it out myself, but I wanted to check here if running applyStrategy in a loop, while looping over different dates, will work? I could not find any examples of this. There are 2 reasons for wanting to do this: First of all, one could have a couple of years of tick data, which is too big to fit in memory for each symbol. Of course, I am assuming that the orders placed by the strategy are sparse enough so that the order_book generated by applyStrategy can still fit in memory. The second reason is that if this loop could moreover be run in parallel, then there could potentially be a 500x speed up for two years of data.
James, The answer is 'it depends'. There is a parallel version of applyStrategy in the sandbox on github. I haven't touched it in several years, so I wouldn't trust that code. I mention it as an example of what is theoretically possible. A better example, which is already parallelized and much more highly utilized, is apply.paramset(). First, to expand on Ilya's answer, let's talk about what *is* possible. It is possible to wrap a foreach loop over applyStrategy that would separate symbols to different workers (though your hypothesized 500x speedup would require *at least* 500 worker nodes, spread out over several physical machines, using something like doRedis, which we have tested up to around 200 workers). This assumes that each symbol is completely independent, and that there is no interaction on things like trade sizing or capital or risk among the symbols. The simplest way to do this would be to create separate portfolios per symbol, so that each worker is completely independent. See examples of a different kind of splitting and parallelization in appply.paramset() (which is also used in walk forward testing). It is also possible, and we commonly do this, to segment the dates that you want to run applyStrategy over. As you hypothesized, a simple loop over date regions, loading different non-conflicting time series, may be applied to successively run each date range. This, as you noted, works well when even 64, 128, or 512GB+ of RAM is not enough for all of your data. We've made a number of changes over the years to make quantstrat more memory efficient, but copies are still made when unavoidable, state is kept between the various nested apply* functions, and RAM use basically grows throughout the run of a strategy evaluation. So segmenting the use of market data by Dates can help, though you may need to discard some intermediary results (like portions of the order book) to make everything fit. In the first example of parallelizing by symbol, RAM is your most likely issue still, since even very large machines rarely have more than about 16GB per core/thread. You still have some wrinkles here. Again, you need to assess whether there is any interaction. Transactions cannot be added to a portfolio out of order, as the P&L is (potentially) dependent on prior transactions. So you may again need to create multiple portfolios and stitch the different period P&L together yourself. So, in the 'don't do that' camp, don't try to apply transactions out of order, the trade blotter won't allow it. In the 'should work' camp are several variations of splitting your computational problem so that it is amendable to looping and/or parallelization, described above. Regards, Brian -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock