correlation between two stock market indices

Many thanks, Bernhard!

What do you think about the suggestion, made by another 
list member, 
that I can just compute the correlation for the differentiated data 
between the two stock market index series, with no control for 
autocorrelation, etc, since according to the effective market 
hypothesis 
stock market index series don't show autocorrelation at all?

well, here you are superimposing the validity of a 
hypothesis, that should be checked first. By using 
differenced data you are almost always on the *safe side*, 
but again you are giving up the information content of the 
series in levels. This can be circumvented by specifying an 
ECM. Furthermore, you might want to use log data, i.e. a 
transformation that stabilises the variance. As a side effect 
the lm() estimated coefficients can be interpreted as 
elasticities, i.e. the responsiveness of your lhs-variable to 
a unit change of your rhs-variable (in levels).

I think, I will just check, if there isn't an 
autocorrelation, checking 
acf and pacf, as you suggested. Thanks a lot.

yes, and this tells you the order to specify for arma(), 
given a stationary series:

ar(p): slowly decaying acf (or dampening alternating in case 
of negative ar coeffcient) and a spike at p in the pacf.

ma(q): just like ar(p), but the shape of acf and pacf are 
reversed, i.e. single peak in the acf and slowly decaying 
pacf (or dampening alternating in case of negative ma coeffcient).

HTH,
Bernhard

Cheers

Christoph

Pfaff, Bernhard wrote:
Dear finance professionals

As I was asked by a friend, whether we can compute the 
correlation 
between two stock market indices (e.g. NASDAQ index and Dow Jones 
index), and I am unfortunately NOT an expert in finance:

Hello Christoph,

you can almost always compute correlations, if these 
calculations make sense
and are meaningful is a different matter :-)

(1) What model would you recommend for this kind of question?

something like:

library(ts)
arima(x, order=???, xreg=y)

sure, you can do this and choose the appropriate order as 
it is outlined by
Box-Jenkins (i.e. check the acf and pacf of the residuals 
combined with
diagnostic tests for serial uncorrelatedness). Most likely 
you want/have to
work with differenced data, due to the *trending* character 
of the ts in
question. The snag is that level information is lost. 
Hence, you might want
to specify an ECM / VECM and prior to this check the order 
of integration of
the series involved. Relevant packages to accomplish this 
would be ts,
tseries, dse and urca; to my knowledge (check
http://www.mayin.org/ajayshah/KB/R/R_for_economists.html 
for an overview). 

library(nlme)
gls(x~y,correlation=corARMA(p=?,q=?))

what would you recommend, and what about the "?" :)

this would apply if the *error term* is not nicely behaved 
and would follow
as a second step, hence after checking the residuals from a 
simple lm() or
arima(), as is described from ?gls

Description:

     This function fits a linear model using generalized 
least squares.
     The errors are allowed to be correlated and/or have unequal
     variances.

As a side note, in econometrics it is common notation that 
the response is
named 'y' and the predictor 'x' and not vice versa.

(2) Furthermore, searching the web, I found, that (sorry, 
you experts 
certainly know this, but I have no experience with 
financial data), 
usually the time series are uncorrelated, but show strong "ARCH 
effects", ie., are not independent.

ARCH refers to the behaviour of the variance of the error term
(autoregressive conditional heteroskedasticity). Again, 
check the residuals
first, if ARCH is prevailent and only then estimate an 
ARCH, GARCH etc. type
of model. Note, uncorrelatedness and independence are only 
equivalent in
case of normality. The former does not imply the latter, 
only if the the
series are normally distributed. But if two series are 
independent then
these series are also uncorrelated.

A last side note, ask yourself what the model's aim is. 
What should the
model explain? What's it purpose? After having answered 
these questions, you
can pick one of methods and not blindly apply either one of them.

HTH,
Bernhard

Does this mean, that any kind of correlation analysis with 
stock market
indices is senseless, since maybe we don't get a sign. 
correlation, but 
this doesn't mean that the series are independent?

Many thanks for your help

Chris

_______________________________________________
R-sig-finance@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance

--------------------------------------------------------------
------------------
The information contained herein is confidential and is 
intended solely for the
addressee. Access by any other party is unauthorised 
without the express
written permission of the sender. If you are not the 
intended recipient, please
contact the sender either via the company switchboard on 
+44 (0)20 7623 8000, or
via e-mail return. If you have received this e-mail in 
error or wish to read our
e-mail disclaimer statement and monitoring policy, please 
refer to 
http://www.drkw.com/disc/email/ or contact the sender. 3167

--------------------------------------------------------------
------------------

--------------------------------------------------------------------------------
The information contained herein is confidential and is inte...{{dropped}}
You can go far with short R code fragments, such as :

     library(its)
     x1 <- priceIts(instrument=c("^ftse"), start="1998-01-01", quote = "Close")
     x2 <- priceIts(instrument=c("^gdax"), start="1998-01-01", quote = "Close")
     prices <- intersect(x1,x2)
     names(prices) <- c("FTSE","DAX")
     returns = 100*diff(log(prices))
     cor(returns)

which gives 

          FTSE       DAX
FTSE 1.0000000 0.7455468
DAX  0.7455468 1.0000000

so you see that returns on the FTSE and the DAX have had a correlation
of 0.7455 in the post-1998 period.

I think the above serves as a great demo of R and 'its' in action! :-)

The pitfalls:

  * Timezones matter greatly. If one market closes before another,
    then it will look like one is causing the other, if viewed through
    daily returns.

    So you have to either go 'in' and pick intervals when both markets
    are contemporaneously trading, or you have to zoom 'out' and pick
    fat intervals where the overlaps matter less.

    E.g. India and the US have exactly non-overlapping hours. I find
    it hard to meaningfully interpret correlations of contemporaneous
    or lagged returns of daily data. It makes more sense to discuss
    correlations of weekly data.

  * Be careful to intersect and then compute returns, as done
    above. Be careful to make returns as 100*diff(log(prices)), as
    done above.

  * R TODO:
    The code fragment above works for indexes but not for individual
    stocks, since priceIts does not know how to give us "adjusted
    closing prices". That is, when splits take place, the price gets
    clobbered, and the series generated by priceIts is useless. I
    would be very happy if people on this list are able to propose a
    solution. I think priceIts is fabulous but this is a major gap in
    functionality. Raw closing prices are next to useless since most
    finance starts with returns, and in order to make returns, we need
    adjusted closing prices.

  * Markets are quite efficient for individual products (e.g. index
    futures or stocks) and you don't generally have problems with
    time-series structure. But when you get to indexes, the process of
    making linear combinations of things that trade with different
    timestamps is known to induce suprious autocorrelations. Andy Lo
    and others have papers on this. This problem is particularly acute
    when an index contains illiquid products.

  * Even if you dealt with individual traded products, like stocks or
    index futures, you'd have the problem of time-varying
    correlations.  You can do rolling window correlations and they'll
    help.

    R TODO: 
    There isn't yet a bivariate ARCH implementation in CRAN or in the
    Debian packages.

    R TODO:
    There isn't yet an abstract engine doing rolling window estimation
    in R, where you get to define the estimator (e.g. as is the case
    with 'by' where you get to specify FUN).
Ajay Shah                                                   Consultant
ajayshah@mayin.org                      Department of Economic Affairs
http://www.mayin.org/ajayshah           Ministry of Finance, New Delhi
    The code fragment above works for indexes but not for individual
    stocks, since priceIts does not know how to give us "adjusted
    closing prices". That is, when splits take place, the price gets
get.hist.quote(0 in tseries was written before Yahoo! added that fifth
column of adjusted values. priceIts() is a pretty straight copy of
get.hist.quote().

If you submit a decent patch to either one, including an update to the .Rd
file, I am sure that it will get integrated into the package in due course.

Dirk
Those are my principles, and if you don't like them... well, I have others.
                                                -- Groucho Marx