Skip to content

Processing time of backtests on a single computer

25 messages · Jersey Fanatic, Joshua Ulrich, david.jessop at ubs.com +5 more

#
The email with the CSV data attached is waiting for moderator's approval.
The reproducible code is below.


2016-04-06 23:17 GMT+03:00 Jersey Fanatic <jerseyfanatic1 at gmail.com>:

  
  
#
You didn't say that you were doing parameter optimization.

The length of time that a parameter optimization using brute force will
take is a linear combination of the number of parameter combinations
that you choose to search.

Typically, you should only include parameter distributions for
parameters for which you feel you have a strong economic justification.
Your strategy contains eight parameter distributions and two
constraints.  The choices of these distributions appears arbitrary, and
will result in hundreds of combinations to test.  So, you should expect
your strategy to take approximately a linear combination of additional
time based on the number of parameters you wish to test. 

Another thing that will make this test take a long time is the inclusion
of trailing stops.  As is described in the documentation, trailing stops
require evaluating the strategy at more points in the path dependent
loop.  The number of observations that need to be evaluated in the path
dependent rules loop has another linear effect on the time to evaluate
your backtest.  Have you validated that there is a theoretical
justification for a trailing stop?  Does this increase the positive
expectation of your resulting signal process?

Most of the validation of your indicator and signal processes should be
possible long before you get to parameter optimization.

Regards,

Brian
#
I stated that I was doing parameter optimization, although I should have
been more clear:
So the total number of combinations to test is just 144, not several
hundred. That makes the processing time for each combination approx 4.2
minutes. Is that within normal bounds?

I will try running the same code without trailing stops and see what effect
it has on the processing time. I will report back as soon as it is
finished.



2016-04-06 23:48 GMT+03:00 Brian G. Peterson <brian at braverock.com>:

  
  
#
On Wed, 2016-04-06 at 23:58 +0300, Jersey Fanatic wrote:
Running the macd demo code over 10 years of daily data on my machine (no
trailing stops) takes 0.5262365 secs for a single run.
#
10 years of daily data makes about 2500 data points. So extrapolating from
that to 58000 data points (assuming the relation is linear), it should take
about 12.2 secs for a single run with my dataset. For 144 runs (total
number of parameter combinations), it should take about 30 mins. However, I
ran apply.paramset() this morning (without trailing stops), it took 4.5
hours. And the code is the one that I sent earlier with Trailing stop rules
enabled=FALSE'd.

Did you run the macd demo code with single core? If you did some parallel
processing, did you use doSNOW package or something else? Maybe that is the
reason, I am not sure.

Would deleting trailing stop rules speed things up, instead of defining
them but setting enabled=FALSE?

2016-04-07 0:34 GMT+03:00 Brian G. Peterson <brian at braverock.com>:

  
  
#
On Thu, Apr 7, 2016 at 8:10 AM, Jersey Fanatic <jerseyfanatic1 at gmail.com> wrote:
Number of data points is not necessarily a good estimator for run time
even if the strategies are the same.  What matters more is the number
of timestamps/observations that must be evaluated.  That includes
signals, moving orders, processing fills, etc.
Again, the relationship is not linear.  Different parameter
combinations will produce differing amounts of signals, order
movement, fills, etc.

For example, I ran parameter optimization on ~3 years of 5-second
data.  Some parameter combinations took 1-2 minutes, some took >20
minutes.

  
    
#
Hi

Just to add to Brian's comment on "economic justification", see David Bailey / de Prado's sequence of papers (e.g. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2460551 or http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2308659 , the latter is probably better) on the perils of overfitting parameters.  Summarising the paper, if you run just 100 backtests and optimise the parameters on a strategy with a true Sharpe ratio of zero then the best one will have on average an apparent Sharpe of 2.5.  

Regards

David


-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Brian G. Peterson
Sent: 06 April 2016 21:49
To: Jersey Fanatic
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Processing time of backtests on a single computer

You didn't say that you were doing parameter optimization.

The length of time that a parameter optimization using brute force will take is a linear combination of the number of parameter combinations that you choose to search.

Typically, you should only include parameter distributions for parameters for which you feel you have a strong economic justification.
Your strategy contains eight parameter distributions and two constraints.  The choices of these distributions appears arbitrary, and will result in hundreds of combinations to test.  So, you should expect your strategy to take approximately a linear combination of additional time based on the number of parameters you wish to test. 

Another thing that will make this test take a long time is the inclusion of trailing stops.  As is described in the documentation, trailing stops require evaluating the strategy at more points in the path dependent loop.  The number of observations that need to be evaluated in the path dependent rules loop has another linear effect on the time to evaluate your backtest.  Have you validated that there is a theoretical justification for a trailing stop?  Does this increase the positive expectation of your resulting signal process?

Most of the validation of your indicator and signal processes should be possible long before you get to parameter optimization.

Regards,

Brian
  
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
On Wed, 2016-04-06 at 23:20 +0300, Jersey Fanatic wrote:
_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
-------------- next part --------------

Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 1 Finsbury Avenue, London EC2M 2PP
UBS Limited is authorised by the Prudential Regulation Authority 
and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment: 1 Finsbury
Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
by the Prudential Regulation Authority and subject to regulation
by the Financial Conduct Authority and limited regulation by the
Prudential Regulation Authority.  Details about the extent of our
regulation by the Prudential Regulation Authority are available
from us on request.

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases.
#
Thanks for the insight. I did not know variations in processing time of 20
minutes or so could happen between different parameter combinations.

I ran the strategy with the same dataset with random parameters without
Trailing SL, on single core and it took 5.15 minutes. The number of
transactions were 7800. The amount of processing time seems too much
compared to yours though. 5sec data of 3 years vs M5 data of just 1 year;
20 min vs 5 mins.

2016-04-07 16:32 GMT+03:00 Joshua Ulrich <josh.m.ulrich at gmail.com>:

  
  
#
Thanks Mr. Jessop for the recommendation of papers. I am familiar of
backtest overfitting and the works of Mr.Prado and Mr.Bailey. I implemented
an idea (Probability of Backtest Overfitting) from one of their papers
recently into my methodology recently to prevent overfitting. I am just
trying to figure out whether something is wrong in my code that increases
backtesting time significantly or the amount it takes is just normal with
the resources I've got.

2016-04-07 17:50 GMT+03:00 <david.jessop at ubs.com>:

  
  
#
On Thu, Apr 7, 2016 at 11:30 AM, Jersey Fanatic
<jerseyfanatic1 at gmail.com> wrote:
Again, the number of observations is not a good predictor of the
amount of time it will take.  You have 7800 transactions.  My shortest
(longest) run had 25 (1500) transactions.

Seems reasonable to me that a strategy producing nearly 8000
transactions takes about 5 minutes; that's about 25 transactions a
second.

  
    
#
So I tried to see what effect trailing stop and stop loss rules have on the
processing time. For the same dataset, only SL took 5 mins while only
trailing SL took 10 mins, and trailing SL with normal SL took 13 mins. I
guess it is processing it as fast as it should be. Thanks for all the help.

2016-04-07 20:45 GMT+03:00 Joshua Ulrich <josh.m.ulrich at gmail.com>:

  
  
#
Hello,

Only thing I can think of that might make it take longer, that affected me
in the past, is RAM, especially in Windows.  Is it maxing out at all?
On Thursday, April 7, 2016, Jersey Fanatic <jerseyfanatic1 at gmail.com> wrote:

            

  
    
#
RAM is usually at 80% or so, but CPU is maxing out, I guess that means RAM
is sufficient?

2016-04-07 23:23 GMT+03:00 Erol Biceroglu <erol.biceroglu at alumni.utoronto.ca

  
  
#
So here are my latest results of testing using the same dataset and rules
(no trailing stoploss):
8-core using doSNOW -> 1.07 hours
4-core using doSNOW -> 59.7 minutes
single-core -> 2.11 hours

So yeah, I guess it is normal for a strategy with high number of
transactions to take this long to backtest.


2016-04-07 20:45 GMT+03:00 Joshua Ulrich <josh.m.ulrich at gmail.com>:

  
  
#
Windows can start using the swap file long before 80% memory utilization. If
speed is of the essence, you might want to max out the memory on your
machine. If that doesn't help, return the memory. 

Frank
Chicago, IL

-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of Jersey Fanatic
Sent: Friday, April 08, 2016 12:47 AM
To: Erol Biceroglu
Cc: r-sig-finance; Brian G. Peterson
Subject: Re: [R-SIG-Finance] Processing time of backtests on a
singlecomputer

RAM is usually at 80% or so, but CPU is maxing out, I guess that means RAM
is sufficient?

2016-04-07 23:23 GMT+03:00 Erol Biceroglu <erol.biceroglu at alumni.utoronto.ca
number
took
_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.
#
All info I could find was about setting the swap file size, not enabling
Windows to max out memory, or disabling paging before like %90 memory
usage. But will research further too see how its done. Thanks.

2016-04-08 16:22 GMT+03:00 Frank <frankm60606 at gmail.com>:

  
  
#
I mean buy the maximum amount of memory your system allows, install it and
see if that helps. Memory is cheap. On the expensive side, most quad core i7
CPU sockets can accept a six core i7 CPU upgrade. These were $1,000 the last
time I checked.

 

Frank

Chicago, IL

 

  _____  

From: Jersey Fanatic [mailto:jerseyfanatic1 at gmail.com] 
Sent: Friday, April 08, 2016 9:28 AM
To: Frank
Cc: Erol Biceroglu; r-sig-finance; Brian G. Peterson
Subject: Re: [R-SIG-Finance] Processing time of backtests on a
singlecomputer

 

All info I could find was about setting the swap file size, not enabling
Windows to max out memory, or disabling paging before like %90 memory usage.
But will research further too see how its done. Thanks.

 

2016-04-08 16:22 GMT+03:00 Frank <frankm60606 at gmail.com>:

Windows can start using the swap file long before 80% memory utilization. If
speed is of the essence, you might want to max out the memory on your
machine. If that doesn't help, return the memory.

Frank
Chicago, IL

-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of Jersey Fanatic
Sent: Friday, April 08, 2016 12:47 AM
To: Erol Biceroglu
Cc: r-sig-finance; Brian G. Peterson
Subject: Re: [R-SIG-Finance] Processing time of backtests on a
singlecomputer

RAM is usually at 80% or so, but CPU is maxing out, I guess that means RAM
is sufficient?

2016-04-07 23:23 GMT+03:00 Erol Biceroglu <erol.biceroglu at alumni.utoronto.ca
number
took
_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.
#
Will go with the cheap route. I will report back the performance increase
if I remember to do so. Thanks for the recommendation.

2016-04-08 17:32 GMT+03:00 Frank <frankm60606 at gmail.com>:

  
  
#
On Fri, 2016-04-08 at 08:50 +0300, Jersey Fanatic wrote:
I tested a version of this strategy on a 6-core (12 thread) i7-4930K
CPU @ 3.40GHz with 64GB of RAM. [Ref. 1 processingtime_q.ps.R]

4.64 hrs one core (registerDoSEQ)
2.12 hrs six cores using doMC (defaults)
1.77 hrs twelve threads using DoMC (defaults)
1.32 hrs six cores doMC (mc.prescheduling=FALSE)
   44 min twelve threads using DoMC (mc.prescheduling=FALSE)

My script was based on the script originally posted to this thread, and
likely had more rules and tighter parameters than the test reported by
the OP above.

Given a single script and parameter combination verified with the OP,
this machine was about twice as fast. [Ref. 1 
processingtime_q_rsigfinance.R]
5.15 min. for OP and 1.99 min on this reported test machine

Some observations:

RAM:
- all tests consumed more than 8GB of RAM at some point (8.1GB for the
single thread version, 11.8GB for the six thread version, and 18GB for
the 12-thread version)

CPU load:
- the 6-thread test had a load average of about 8, and the 12-thread
  test had a load average below 7, suggesting that the 12-thread test
is using resources less efficiently. with prescheduling FALSE, load
averages were higher for both tests, about 10 for the 6-core test
and about 16 for the 12-thread test.

Load Balancing:
- after about an hour, a load balancing problem was observed, fewer than
half the cores/threads were still executing at 100%.  In the case of the
twelve-thread version, after an hour and half, only one CPU was still
spinning at 100%. It is probable that use of a backend like doRedis or
zmq that are designed for load balancing or taking advantage of a
different prescheduling method in the multicore or SNOW backends would
shorten the execution time, potentially by a lot.

Economic Justification:
- I observed that the shortest timeframe indicator and signal processes
are the most aggressive, often scratching trades.  If trade costs were
taken into account when analyzing the signal process (before even
contemplating rules and a backtest, these short-timeframe signals would
have been ruled out before a brute force parameter search.  These
parameter combinations also take the longest to run, in addition to
likely being economically unfeasible.


References:

[1] The data file and script have been added to quantstrat's sandbox
directory in SVN, for those who are interested.

    /pkg/quantstrat/sandbox/paramtest201604/
1 day later
#
Hi all,

my simple strategy sets long/short ENTER orders and relative long/short 
EXIT orders.

I don't understand why, if I have an open LONG position, SHORT enter 
order triggers (wrongly closing open position...).

My desire is that if I already have an OPEN position (long) "opposite" 
orders (short enter) do not triggers.

Thanks in advance for your help

Diego


#################################
# MAX POSITION (1 future)
#################################
addPosLimit(portfolio = qs.strategy, symbol = symbol, timestamp = 
initDate, maxpos = 1)

#################################
# LONG STRATEGY
#################################
add.rule(qs.strategy, name='ruleSignal',
          arguments = list(sigcol="upTrend", sigval=TRUE,
                           replace=TRUE,
                           prefer='Open',
                           orderside='long',
                           ordertype='stoplimit',
order.price=quote(mktdata$price.enter.long[timestamp]),
                           orderqty=1,
                           osFUN='osMaxPos',
                           time.in.force=quote(timeInForceLong)),
          type='enter',
          label='LE')
add.rule(qs.strategy, name='ruleSignal',
          arguments=list(sigcol='exitLong', sigval=TRUE,
                         replace=FALSE,
                         orderside='long',
                         ordertype='market',
                         orderqty='all',
                         prefer='Open'
          ),
          type='exit',
          label='ExitLong',
          enabled=TRUE)

#################################
# SHORT STRATEGY
#################################
add.rule(qs.strategy, name='ruleSignal',
          arguments=list(sigcol="downTrend", sigval=TRUE,
                         replace=TRUE,
                         prefer='Open',
                         orderside='short',
                         ordertype='stoplimit',
order.price=quote(mktdata$price.enter.short[timestamp]),
                         orderqty=-1,
                         osFUN='osMaxPos',
                         time.in.force=quote(timeInForceShort)),
          type='enter',
          label='SE')
add.rule(qs.strategy, name='ruleSignal',
          arguments=list(sigcol='exitShort', sigval=TRUE,
                         replace=FALSE,
                         orderside='short',
                         ordertype='market',
                         orderqty='all',
                         prefer='Open'
          ),
          type='exit',
          label='ExitShort',
          enabled=TRUE)
#
Short orders do not by default cancel long orders. You need to have them in
the same order set.

On Mon, Apr 11, 2016 at 6:39 AM, Diego Peroni <diegoperoni at vodafone.it>
wrote:

  
  
#
Thank you Ilya,

I've placed orderset='ocoall' in both ENTER orders (named "LE" and "SE") 
as in the following code but testing them I have the same problem: long 
positions are closed by short (enter) positions...
Can you please tell me what I'm doing wrong?

Diego


#################################
# MAX POSITION (1 future)
#################################
addPosLimit(portfolio = qs.strategy, symbol = symbol, timestamp = 
initDate, maxpos = 1)

#################################
# LONG STRATEGY
#################################
add.rule(qs.strategy, name='ruleSignal',
          arguments = list(sigcol="upTrend", sigval=TRUE,
                           replace=TRUE,
                           prefer='Open',
                           orderside='long',
                           ordertype='stoplimit',
order.price=quote(mktdata$price.enter.long[timestamp]),
                           orderqty=1,
                           osFUN='osMaxPos',
                           orderset='ocoall',
                           time.in.force=quote(timeInForceLong)),
          type='enter',
          label='LE')

add.rule(qs.strategy, name='ruleSignal',
          arguments=list(sigcol='exitLong', sigval=TRUE,
                         replace=FALSE,
                         orderside='long',
                         ordertype='market',
                         orderqty='all',
                         prefer='Open'
          ),
          type='exit',
          label='ExitLong',
          enabled=TRUE)

#################################
# SHORT STRATEGY
#################################
add.rule(qs.strategy, name='ruleSignal',
          arguments=list(sigcol="downTrend", sigval=TRUE,
                         replace=TRUE,
                         prefer='Open',
                         orderside='short',
                         ordertype='stoplimit',
order.price=quote(mktdata$price.enter.short[timestamp]),
                         orderqty=-1,
                         osFUN='osMaxPos',
                         orderset='ocoall',
                         time.in.force=quote(timeInForceShort)),
          type='enter',
          label='SE')

add.rule(qs.strategy, name='ruleSignal',
          arguments=list(sigcol='exitShort', sigval=TRUE,
                         replace=FALSE,
                         orderside='short',
                         ordertype='market',
                         orderqty='all',
                         prefer='Open'
          ),
          type='exit',
          label='ExitShort',
          enabled=TRUE)
On 11/04/2016 15:05, Ilya Kipnis wrote:

  
  
#
I see your question. If you want short orders not to fire when you have a
long position, you should create a custom order sizing function that
returns 0 if your current position isn't zero for both long and short
positions.

On Mon, Apr 11, 2016 at 11:07 AM, Diego Peroni <diegoperoni at vodafone.it>
wrote:

  
  
#
Now it works,
this is my custom function:

osMaxPosLS = function(data, timestamp, orderqty, ordertype, orderside, 
portfolio, symbol, ruletype, ...) {
   maxpos = osMaxPos(data, timestamp, orderqty, ordertype, orderside, 
portfolio, symbol, ruletype, ...)
   curpos = getPosQty(portfolio, symbol, timestamp)
   if (orderside == 'long' & curpos<0)
     maxpos = 0
   if (orderside == 'short' & curpos>0)
     maxpos = 0
   return (maxpos)
}

Thank you for your help!

Diego
On 11/04/2016 17:13, Ilya Kipnis wrote:

  
  
#
Glad to be of help!

On Mon, Apr 11, 2016 at 11:42 AM, Diego Peroni <diegoperoni at vodafone.it>
wrote: