Number of data points required for Cointigration - R-SIG-Finance

Fri, Jan 23, 2015 1:24 AM #

Hi

I need help in figuring out the length of historical data that I should
use. I took stock prices(daily close) for two tickers from yahoo(200
days).I tried finding regression coefficient using PCA and I use 150 points
for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread comes
out to be non stationary. From all these observation *I think* that this is
not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs vs
basket). I have attached the plot and the script that was used to generate
the plot.

Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20150123/a67a0136/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot.png
Type: image/png
Size: 4475 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20150123/a67a0136/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test3.R
Type: application/octet-stream
Size: 2242 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20150123/a67a0136/attachment.obj>

Paul Teetor

Mon, Jan 26, 2015 3:59 PM #

Amol,
I don't have a formula or a guideline for determining the number of data points. But I can share two experiences.
First, when I traded mean-reverting spreads, I used 3 to 5 years of daily data. That's 750 to 1,250 data points. Less data did not work well for my spreads.
Second, in my experience, the ADF test was quite unstable. That is, it might fail to reject for a while, then start rejecting the null hypothesis when the market showed some trending back to the mean. Then it would fail to reject again as the market wandered away.
Perhaps smarter people than I have had better luck trading with the ADF, but for me, it did not provide a complete answer to the question of mean-reversion.
Paul?Paul Teetor, Elgin, IL USAhttp://quantdevel.com/public
From: amol gupta <amolgupta87 at gmail.com>
To: "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
Sent: Friday, January 23, 2015 3:24 AM
Subject: [R-SIG-Finance] Number of data points required for Cointigration

Hi
I need help in figuring out the length of historical data that I should use. I took stock prices(daily close) for two tickers from yahoo(200 days).I tried finding regression coefficient using PCA and I use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I use 150 point long spread, It comes out to be nonstationary. If I use 200 points data the outcome is stationary
I again used 200 points to do the PCA and find regression. The spread comes out to be non stationary. From all these observation I think that this is not a stable relationship.
So following are my questions
- Is there a way to decide length of historical data to use?
- Some relationship may be more stable than others. Is there away to quantify it?
Any other insight in this regard will be appreciated(time frame, pairs vs basket). I have attached the plot and the script that was used to generate the plot.

Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.


_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.

  
	[[alternative HTML version deleted]]

John C Frain

Tue, Jan 27, 2015 5:50 AM #

On 23 January 2015 at 09:24, amol gupta <amolgupta87 at gmail.com> wrote:

I don't think so. My involvement in cointegration was in the
macroeconometrics area. Here we liked to use at least 30 years of
data. For annual or quarterly data series such as spreads were
generally stationary about a constant level. The idea of a equilibrium
trend in a spread does not make sense to me. In such cases one would
draw a graph and look at the number of times the series crossed a
measure of the equilibrium spread. The more times the series crosses
the equilibrium the easier it is to assess stationarity. If there are
few crossings the either the series in non-stationary or you have not
got enough data.  You can also think about how long shocks will take
to work through the system. Your series should be a multiple of that
length. Increasing the periodicity of the data does not necessarily
may lengthen the series but it does not increase the time covered by
the series.

Yes. If you fit an error correction mechanism by Engle-Granger or
Johannsen the coefficient on the error correction mechanism can be
used to get a half life for deviations from equilibrium. A small
coefficient means that the equilibrium is restored slowly.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

John C Frain, Ph.D.

Economics Department             3 Aranleigh Park
Trinity College Dublin                 Rathfarnham
College Green                           Dublin 14
Dublin 2                                    Ireland
Ireland
www.tcd.ie/Economics/staff/frainj/home.htm
mailto:frainj at tcd.ie
mailto:frainj at gmail.com

amol gupta

Tue, Jan 27, 2015 10:09 AM #

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

   - Use other unit root and stationarity  test.
   - Use other cointigration tests like johansen tests.
   - Finding PCA and choosing one of the lower variance portfolio and test
   for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of data
points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of daily
data. That's 750 to 1,250 data points. Less data did not work well for my
spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null hypothesis
when the market showed some trending back to the mean. Then it would fail
to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the ADF,
but for me, it did not provide a complete answer to the question of
mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

  ------------------------------
 *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I should
use. I took stock prices(daily close) for two tickers from yahoo(200
days).I tried finding regression coefficient using PCA and I use 150 points
for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think* that
this is not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs vs
basket). I have attached the plot and the script that was used to generate
the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

	[[alternative HTML version deleted]]

Eric Zivot

Tue, Jan 27, 2015 10:41 AM #

Some quick comments on this issue. From a statistical point of view, the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration (usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.


-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

   - Use other unit root and stationarity  test.
   - Use other cointigration tests like johansen tests.
   - Finding PCA and choosing one of the lower variance portfolio and test
   for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of 
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of 
daily data. That's 750 to 1,250 data points. Less data did not work 
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it 
might fail to reject for a while, then start rejecting the null 
hypothesis when the market showed some trending back to the mean. Then 
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the 
ADF, but for me, it did not provide a complete answer to the question 
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

  ------------------------------
 *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for 
Cointigration

Hi

I need help in figuring out the length of historical data that I 
should use. I took stock prices(daily close) for two tickers from 
yahoo(200 days).I tried finding regression coefficient using PCA and I 
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I 
use
150 point long spread, It comes out to be nonstationary. If I use 200 
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread 
comes out to be non stationary. From all these observation *I think* 
that this is not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs 
vs basket). I have attached the plot and the script that was used to 
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R 
questions should go.

--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.


_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

Mark Leeds

Tue, Jan 27, 2015 10:53 AM #

Hi Eric: Thanks for the educational and thorough explanation. But I think
it's
worse than that. Any econometrics test, whether asymptotic or finite,
depends on a certain underlying DGP that often just doesn't hold. So, even
in the asymptotic case, cointegration tests will break down due to mergers,
buybacks, bankruptcies etc. There is no concept of cointegration or any DGP
that can take these things into account.

I'm not trying to start a flame-war and I use econometrics in finance so I
don't think
its bogus.  I'm  rather just pointing out that, particularly with respect
to cointegration testing,  things can go haywire in a hurry because the
underlying DGP assumption is just not true. So, any type of test is, to
some extent, useless.



Mark

On Tue, Jan 27, 2015 at 1:41 PM, Eric Zivot <ezivot at u.washington.edu> wrote:

Some quick comments on this issue. From a statistical point of view, the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration
(usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the
parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize
the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests
have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.


-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

   - Use other unit root and stationarity  test.
   - Use other cointigration tests like johansen tests.
   - Finding PCA and choosing one of the lower variance portfolio and test
   for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of
daily data. That's 750 to 1,250 data points. Less data did not work
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null
hypothesis when the market showed some trending back to the mean. Then
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the
ADF, but for me, it did not provide a complete answer to the question
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

  ------------------------------
 *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I
should use. I took stock prices(daily close) for two tickers from
yahoo(200 days).I tried finding regression coefficient using PCA and I
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I
use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think*
that this is not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs
vs basket). I have attached the plot and the script that was used to
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R
questions should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

Eric Zivot

Tue, Jan 27, 2015 10:58 AM #

Mark

I completely agree with you. My comments were oriented to the "best case
scenario" . There are obviously many real world considerations that make the
issue very difficult as you point out. And, of course, you has to consider
the dreaded "multiple testing issue" if you are searching for the "best"
cointegrated pair of assets. 

 

From: Mark Leeds [mailto:markleeds2 at gmail.com] 
Sent: Tuesday, January 27, 2015 10:54 AM
To: Eric Zivot
Cc: amol gupta; Paul Teetor; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

 

Hi Eric: Thanks for the educational and thorough explanation. But I think
it's
worse than that. Any econometrics test, whether asymptotic or finite,
depends on a certain underlying DGP that often just doesn't hold. So, even
in the asymptotic case, cointegration tests will break down due to mergers,
buybacks, bankruptcies etc. There is no concept of cointegration or any DGP
that can take these things into account.

I'm not trying to start a flame-war and I use econometrics in finance so I
don't think
its bogus.  I'm  rather just pointing out that, particularly with respect to
cointegration testing,  things can go haywire in a hurry because the
underlying DGP assumption is just not true. So, any type of test is, to some
extent, useless.



 
Mark

On Tue, Jan 27, 2015 at 1:41 PM, Eric Zivot <ezivot at u.washington.edu> wrote:

Some quick comments on this issue. From a statistical point of view, the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration (usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.


-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

   - Use other unit root and stationarity  test.
   - Use other cointigration tests like johansen tests.
   - Finding PCA and choosing one of the lower variance portfolio and test
   for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of
daily data. That's 750 to 1,250 data points. Less data did not work
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null
hypothesis when the market showed some trending back to the mean. Then
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the
ADF, but for me, it did not provide a complete answer to the question
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

  ------------------------------
 *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I
should use. I took stock prices(daily close) for two tickers from
yahoo(200 days).I tried finding regression coefficient using PCA and I
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I
use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think*
that this is not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs
vs basket). I have attached the plot and the script that was used to
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R
questions should go.

--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.


_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

Mark Leeds

Tue, Jan 27, 2015 11:05 AM #

Hi Eric: yes, multiple testing-data mining is another problem. The whole
pairs
thing is a messy undertaking that I never cracked. I want to go back to it
someday. Paul has a paper that discusses a different statistical
methodology that looks interesting but I think, however one approaches it
statistically,  it also needs "heursticy" techniques to add robustness.
All the best and thanks again for your
comments.


Mark

On Tue, Jan 27, 2015 at 1:58 PM, Eric Zivot <ezivot at u.washington.edu> wrote:

Mark

I completely agree with you. My comments were oriented to the "best case
scenario" . There are obviously many real world considerations that make
the issue very difficult as you point out. And, of course, you has to
consider the dreaded "multiple testing issue" if you are searching for the
"best" cointegrated pair of assets.



*From:* Mark Leeds [mailto:markleeds2 at gmail.com]
*Sent:* Tuesday, January 27, 2015 10:54 AM
*To:* Eric Zivot
*Cc:* amol gupta; Paul Teetor; r-sig-finance at r-project.org

*Subject:* Re: [R-SIG-Finance] Number of data points required for
Cointigration



Hi Eric: Thanks for the educational and thorough explanation. But I think
it's
worse than that. Any econometrics test, whether asymptotic or finite,
depends on a certain underlying DGP that often just doesn't hold. So, even
in the asymptotic case, cointegration tests will break down due to mergers,
buybacks, bankruptcies etc. There is no concept of cointegration or any DGP
that can take these things into account.

I'm not trying to start a flame-war and I use econometrics in finance so I
don't think
its bogus.  I'm  rather just pointing out that, particularly with respect
to cointegration testing,  things can go haywire in a hurry because the
underlying DGP assumption is just not true. So, any type of test is, to
some extent, useless.


Mark












On Tue, Jan 27, 2015 at 1:41 PM, Eric Zivot <ezivot at u.washington.edu>
wrote:

Some quick comments on this issue. From a statistical point of view, the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration
(usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the
parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize
the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests
have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.


-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

   - Use other unit root and stationarity  test.
   - Use other cointigration tests like johansen tests.
   - Finding PCA and choosing one of the lower variance portfolio and test
   for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of
daily data. That's 750 to 1,250 data points. Less data did not work
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null
hypothesis when the market showed some trending back to the mean. Then
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the
ADF, but for me, it did not provide a complete answer to the question
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

  ------------------------------
 *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I
should use. I took stock prices(daily close) for two tickers from
yahoo(200 days).I tried finding regression coefficient using PCA and I
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I
use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think*
that this is not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs
vs basket). I have attached the plot and the script that was used to
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R
questions should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

Paul Teetor

Wed, Jan 28, 2015 4:41 AM #

Many thanks to John, Eric, and Mark for their comments. This is a very useful discussion.
Amol, you asked about other methods for testing for mean reversion. At the risk of starting a flame war, I no longer believe in trading "mean reverting" spreads. My thoughts are based on John Bollinger's comments at the R/Finance conference.
In order to be profitably traded, the spread must alternate between moving towards the mean and moving away from the mean. Think about it. If it was always reverting to the mean, you'd have no trading opportunities. The opportunity arises when the market has wandered away from its usual, expected value. We profit by riding it back.
We want oscillating markets; that is, markets that alternate between "mean aversion" and mean reversion. Unfortunately, the ADF test is only for detecting mean reversion. It gets confused by markets that periodically "mean avert". Therefore, the test is not useful for trading purposes.
If someone knows a good statistical test to identify oscillating markets, I'd love to hear about it.
Paul?Paul Teetor, Elgin, IL USAhttp://quantdevel.com/public
      From: Mark Leeds <markleeds2 at gmail.com>
 To: Eric Zivot <ezivot at u.washington.edu> 
Cc: amol gupta <amolgupta87 at gmail.com>; Paul Teetor <paulteetor at yahoo.com>; "r-sig-finance at r-project.org" <r-sig-finance at r-project.org> 
 Sent: Tuesday, January 27, 2015 1:05 PM
 Subject: Re: [R-SIG-Finance] Number of data points required for Cointigration
   
Hi Eric: yes, multiple testing-data mining is another problem. The whole pairs
thing is a messy undertaking that I never cracked. I want to go back to it someday. Paul has a paper that discusses a different statistical methodology that looks interesting but I think, however one approaches it statistically,? it also needs "heursticy" techniques to add robustness.? All the best and thanks again for your
comments.

???????????????????????????????????????????????????????????????????????? Mark




????????????????????????????????????????????????????????

On Tue, Jan 27, 2015 at 1:58 PM, Eric Zivot <ezivot at u.washington.edu> wrote:

MarkI completely agree with you. My comments were oriented to the ?best case scenario? . There are obviously many real world considerations that make the issue very difficult as you point out. And, of course, you has to consider the dreaded ?multiple testing issue? if you are searching for the ?best? cointegrated pair of assets. ?From: Mark Leeds [mailto:markleeds2 at gmail.com] 
Sent: Tuesday, January 27, 2015 10:54 AM
To: Eric Zivot
Cc: amol gupta; Paul Teetor; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for Cointigration?Hi Eric: Thanks for the educational and thorough explanation. But I think it's
worse than that. Any econometrics test, whether asymptotic or finite, depends on a certain underlying DGP that often just doesn't hold. So, even in the asymptotic case, cointegration tests will break down due to mergers, buybacks, bankruptcies etc. There is no concept of cointegration or any DGP that can take these things into account.I'm not trying to start a flame-war and I use econometrics in finance so I don't think
its bogus.? I'm? rather just pointing out that, particularly with respect to cointegration testing,? things can go haywire in a hurry because the underlying DGP assumption is just not true. So, any type of test is, to some extent, useless.

???????????????????????????????????????????????????????????????????????????????????? Mark

??On Tue, Jan 27, 2015 at 1:41 PM, Eric Zivot <ezivot at u.washington.edu> wrote:Some quick comments on this issue. From a statistical point of view, the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration (usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.

-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

? ?- Use other unit root and stationarity? test.
? ?- Use other cointigration tests like johansen tests.
? ?- Finding PCA and choosing one of the lower variance portfolio and test
? ?for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of
daily data. That's 750 to 1,250 data points. Less data did not work
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null
hypothesis when the market showed some trending back to the mean. Then
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the
ADF, but for me, it did not provide a complete answer to the question
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

? ?------------------------------
? *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I
should use. I took stock prices(daily close) for two tickers from
yahoo(200 days).I tried finding regression coefficient using PCA and I
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I
use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think*
that this is not a stable relationship.

So following are my questions

? ? - Is there a way to decide length of historical data to use?
? ? - Some relationship may be more stable than others. Is there away to
? ? quantify it?

Any other insight in this regard will be appreciated(time frame, pairs
vs basket). I have attached the plot and the script that was used to
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R
questions should go.

--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

? ? ? ? [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.?

Anil Bishnoie

Wed, Jan 28, 2015 5:20 AM #

Hi ,????????? If profit trading is goal,I guess Hidden Markov(HMM) with Baum Welch algos is more accurate /profitable alternative to jockey oscillating markets and am sure somebody must have tried with one of major market indexes.
Thanks to all for excellent inputs,
Anil Bishnoie

On Wednesday, January 28, 2015 6:11 PM, Paul Teetor via R-SIG-Finance <r-sig-finance at r-project.org> wrote:

Many thanks to John, Eric, and Mark for their comments. This is a very useful discussion.
Amol, you asked about other methods for testing for mean reversion. At the risk of starting a flame war, I no longer believe in trading "mean reverting" spreads. My thoughts are based on John Bollinger's comments at the R/Finance conference.
In order to be profitably traded, the spread must alternate between moving towards the mean and moving away from the mean. Think about it. If it was always reverting to the mean, you'd have no trading opportunities. The opportunity arises when the market has wandered away from its usual, expected value. We profit by riding it back.
We want oscillating markets; that is, markets that alternate between "mean aversion" and mean reversion. Unfortunately, the ADF test is only for detecting mean reversion. It gets confused by markets that periodically "mean avert". Therefore, the test is not useful for trading purposes.
If someone knows a good statistical test to identify oscillating markets, I'd love to hear about it.
Paul?Paul Teetor, Elgin, IL USAhttp://quantdevel.com/public
? ? ? From: Mark Leeds <markleeds2 at gmail.com>
 To: Eric Zivot <ezivot at u.washington.edu> 
Cc: amol gupta <amolgupta87 at gmail.com>; Paul Teetor <paulteetor at yahoo.com>; "r-sig-finance at r-project.org" <r-sig-finance at r-project.org> 
 Sent: Tuesday, January 27, 2015 1:05 PM
 Subject: Re: [R-SIG-Finance] Number of data points required for Cointigration
? 
Hi Eric: yes, multiple testing-data mining is another problem. The whole pairs
thing is a messy undertaking that I never cracked. I want to go back to it someday. Paul has a paper that discusses a different statistical methodology that looks interesting but I think, however one approaches it statistically,? it also needs "heursticy" techniques to add robustness.? All the best and thanks again for your
comments.

???????????????????????????????????????????????????????????????????????? Mark

????????????????????????????????????????????????????????

On Tue, Jan 27, 2015 at 1:58 PM, Eric Zivot <ezivot at u.washington.edu> wrote:

MarkI completely agree with you. My comments were oriented to the ?best case scenario? . There are obviously many real world considerations that make the issue very difficult as you point out. And, of course, you has to consider the dreaded ?multiple testing issue? if you are searching for the ?best? cointegrated pair of assets. ?From: Mark Leeds [mailto:markleeds2 at gmail.com] 
Sent: Tuesday, January 27, 2015 10:54 AM
To: Eric Zivot
Cc: amol gupta; Paul Teetor; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for Cointigration?Hi Eric: Thanks for the educational and thorough explanation. But I think it's
worse than that. Any econometrics test, whether asymptotic or finite, depends on a certain underlying DGP that often just doesn't hold. So, even in the asymptotic case, cointegration tests will break down due to mergers, buybacks, bankruptcies etc. There is no concept of cointegration or any DGP that can take these things into account.I'm not trying to start a flame-war and I use econometrics in finance so I don't think
its bogus.? I'm? rather just pointing out that, particularly with respect to cointegration testing,? things can go haywire in a hurry because the underlying DGP assumption is just not true. So, any type of test is, to some extent, useless.

???????????????????????????????????????????????????????????????????????????????????? Mark

??On Tue, Jan 27, 2015 at 1:41 PM, Eric Zivot <ezivot at u.washington.edu> wrote:Some quick comments on this issue. From a statistical point of view, the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration (usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.

-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

? ?- Use other unit root and stationarity? test.
? ?- Use other cointigration tests like johansen tests.
? ?- Finding PCA and choosing one of the lower variance portfolio and test
? ?for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of
daily data. That's 750 to 1,250 data points. Less data did not work
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null
hypothesis when the market showed some trending back to the mean. Then
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the
ADF, but for me, it did not provide a complete answer to the question
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

? ?------------------------------
? *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I
should use. I took stock prices(daily close) for two tickers from
yahoo(200 days).I tried finding regression coefficient using PCA and I
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I
use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think*
that this is not a stable relationship.

So following are my questions

? ? - Is there a way to decide length of historical data to use?
? ? - Some relationship may be more stable than others. Is there away to
? ? quantify it?

Any other insight in this regard will be appreciated(time frame, pairs
vs basket). I have attached the plot and the script that was used to
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R
questions should go.

--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

? ? ? ? [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.?



? 
??? [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.

amol gupta

Wed, Jan 28, 2015 8:21 AM #

Paul, Mark, Eric, Anil, John

Thank you all for your valuable inputs. This has been an enriching
discussion.

On Wed, Jan 28, 2015 at 6:50 PM, Anil Bishnoie via R-SIG-Finance <

r-sig-finance at r-project.org> wrote:

Hi ,          If profit trading is goal,I guess Hidden Markov(HMM) with
Baum Welch algos is more accurate /profitable alternative to jockey
oscillating markets and am sure somebody must have tried with one of major
market indexes.
Thanks to all for excellent inputs,
Anil Bishnoie



     On Wednesday, January 28, 2015 6:11 PM, Paul Teetor via R-SIG-Finance
<r-sig-finance at r-project.org> wrote:


 Many thanks to John, Eric, and Mark for their comments. This is a very
useful discussion.
Amol, you asked about other methods for testing for mean reversion. At the
risk of starting a flame war, I no longer believe in trading "mean
reverting" spreads. My thoughts are based on John Bollinger's comments at
the R/Finance conference.
In order to be profitably traded, the spread must alternate between moving
towards the mean and moving away from the mean. Think about it. If it was
always reverting to the mean, you'd have no trading opportunities. The
opportunity arises when the market has wandered away from its usual,
expected value. We profit by riding it back.
We want oscillating markets; that is, markets that alternate between "mean
aversion" and mean reversion. Unfortunately, the ADF test is only for
detecting mean reversion. It gets confused by markets that periodically
"mean avert". Therefore, the test is not useful for trading purposes.
If someone knows a good statistical test to identify oscillating markets,
I'd love to hear about it.
Paul Paul Teetor, Elgin, IL USAhttp://quantdevel.com/public
      From: Mark Leeds <markleeds2 at gmail.com>
 To: Eric Zivot <ezivot at u.washington.edu>
Cc: amol gupta <amolgupta87 at gmail.com>; Paul Teetor <paulteetor at yahoo.com>;
"r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
 Sent: Tuesday, January 27, 2015 1:05 PM
 Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Hi Eric: yes, multiple testing-data mining is another problem. The whole
pairs
thing is a messy undertaking that I never cracked. I want to go back to it
someday. Paul has a paper that discusses a different statistical
methodology that looks interesting but I think, however one approaches it
statistically,  it also needs "heursticy" techniques to add robustness.
All the best and thanks again for your
comments.


Mark









On Tue, Jan 27, 2015 at 1:58 PM, Eric Zivot <ezivot at u.washington.edu>
wrote:

MarkI completely agree with you. My comments were oriented to the ?best
case scenario? . There are obviously many real world considerations that
make the issue very difficult as you point out. And, of course, you has to
consider the dreaded ?multiple testing issue? if you are searching for the
?best? cointegrated pair of assets.  From: Mark Leeds [mailto:
markleeds2 at gmail.com]
Sent: Tuesday, January 27, 2015 10:54 AM
To: Eric Zivot
Cc: amol gupta; Paul Teetor; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration Hi Eric: Thanks for the educational and thorough explanation.
But I think it's
worse than that. Any econometrics test, whether asymptotic or finite,
depends on a certain underlying DGP that often just doesn't hold. So, even
in the asymptotic case, cointegration tests will break down due to mergers,
buybacks, bankruptcies etc. There is no concept of cointegration or any DGP
that can take these things into account.I'm not trying to start a flame-war
and I use econometrics in finance so I don't think
its bogus.  I'm  rather just pointing out that, particularly with respect
to cointegration testing,  things can go haywire in a hurry because the
underlying DGP assumption is just not true. So, any type of test is, to
some extent, useless.


Mark






  On Tue, Jan 27, 2015 at 1:41 PM, Eric Zivot <ezivot at u.washington.edu>
wrote:Some quick comments on this issue. From a statistical point of view,
the
phrase "how many data points are required for cointegration" is not well
defined. Technically speaking, if two series are cointegrated then they are
cointegrated for any number of observations. This issue is really about the
size (probability of rejecting the null hypothesis when the null is true)
and power (probability of rejecting the null when the alternative is true)
of tests for cointegration for a given sample size. In intermediate
statistics text books, the chapters on hypothesis testing usually have some
discussion of the relationship between sample size and power. In simple toy
examples you can work out the number of observations required to have power
equal to some specified value (e.g. 0.90). In this case, if the alternative
is true then you can say you can reject the null at the 5% level with
probability 0.90 if the sample size is n=75 (say). Unfortunately, this
exercise is extremely difficult to do with tests for cointegration
(usually,
the null is no cointegration so rejecting the null is evidence of
cointegration). Why? Well there are only general asymptotic results (as
sample size goes to infinity) for tests for no cointegration (e.g.
Engle-granger two step, Johansen rank tests). There are no general finite
sample results (for fixed sample sizes) for power functions. Hence, you
cannot analytically compute a sample size that will give you a certain
power. What to do? Well, you can try to set up some Monte Carlo experiments
with fixed sample sizes to approximate power functions. The problem with
this is that the results are not general. They will depend on the
parameters
used for the Monte Carlo set up (e.g. parameters for serial correlation,
volatility etc). The best you can do is to try to carefully characterize
the
distributions of the series in question and try out some Monte Carlo
experiments for these data. My guess is that you will have the best results
when you use the class of tests that have been found to be optimal
asymptotic tests (where the asymptotic power curve is tangent to the
infeasible power curve of the optimal test at a set power). These tests
have
been developed by Graham Elliot at UCSD and Michael Jansen at UCB.


-----Original Message-----
From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On Behalf
Of amol gupta
Sent: Tuesday, January 27, 2015 10:09 AM
To: Paul Teetor
Cc: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Number of data points required for
Cointigration

Paul

You say that ADF is not really stable . I agree. Other options to explore
are

   - Use other unit root and stationarity  test.
   - Use other cointigration tests like johansen tests.
   - Finding PCA and choosing one of the lower variance portfolio and test
   for stationarity.(I need to understand PCA more.)

I will take some time and test these ideas. Have you tried anyone of these?
If yes, please share you experiences.

Thank you for your insights.

On Tue, Jan 27, 2015 at 5:29 AM, Paul Teetor <paulteetor at yahoo.com> wrote:

Amol,

I don't have a formula or a guideline for determining the number of
data points. But I can share two experiences.

First, when I traded mean-reverting spreads, I used 3 to 5 years of
daily data. That's 750 to 1,250 data points. Less data did not work
well for my spreads.

Second, in my experience, the ADF test was quite unstable. That is, it
might fail to reject for a while, then start rejecting the null
hypothesis when the market showed some trending back to the mean. Then
it would fail to reject again as the market wandered away.

Perhaps smarter people than I have had better luck trading with the
ADF, but for me, it did not provide a complete answer to the question
of mean-reversion.

Paul

Paul Teetor, Elgin, IL USA
http://quantdevel.com/public <http://quanttrader.info/public>

  ------------------------------
 *From:* amol gupta <amolgupta87 at gmail.com>
*To:* "r-sig-finance at r-project.org" <r-sig-finance at r-project.org>
*Sent:* Friday, January 23, 2015 3:24 AM
*Subject:* [R-SIG-Finance] Number of data points required for
Cointigration

Hi

I need help in figuring out the length of historical data that I
should use. I took stock prices(daily close) for two tickers from
yahoo(200 days).I tried finding regression coefficient using PCA and I
use 150 points for PCA. I find a coefficient Beta.

Now to see whether the spread is mean reverting or not I use ADF. If I
use
150 point long spread, It comes out to be nonstationary. If I use 200
points data the outcome is stationary

I again used 200 points to do the PCA and find regression. The spread
comes out to be non stationary. From all these observation *I think*
that this is not a stable relationship.

So following are my questions

   - Is there a way to decide length of historical data to use?
   - Some relationship may be more stable than others. Is there away to
   quantify it?

Any other insight in this regard will be appreciated(time frame, pairs
vs basket). I have attached the plot and the script that was used to
generate the plot.


--
Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R
questions should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.




    [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.


        [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

Regards
Amol

If all the seas were ink,
And all the reeds were pens,
And all the skies were parchment,
And all the men could write,
These would not suffice
To write down all the red tape
Of this Government.

	[[alternative HTML version deleted]]