ccf (cross correlation function) problems

Tue, Jan 29, 2013 2:04 PM

Your question and your English are just fine!

If I were you, I would not mess around with the ccf() function but
would attack the question "directly" using the cor.test() function, with
sub-vectors of your x vector. Personally I find the notion of "lag" in acf()
and ccf() highly confusing and I always make "parity errors" --- i.e. I get
things backwards!

Moreover, the ccf() function is throwing information away; it truncates
the x vector to have the same length as y, i.e. 21, and so never uses
x[22:29] --- which have useful content in respect of lags less than 8.
You haven't a lot of data, so it is prudent not to be wasteful.

What I would do:

OP <- par(mfrow=c(3,3))
for(i in 1:9) {
CT <- cor.test(x[i:(20+i)],y,alternative="less")
PV <- CT$p.value
cat("lag =",9-i,"p-value =",PV,"\n")
COR <- sprintf("%1.3f",CT$estimate)
plot(x[i:(20+i)],y,xlab="x",main=paste("lag =",9-i,"corr =",COR))
}
par(OP)

HTH

cheers,

Rolf Turner

On 01/29/2013 11:26 PM, Larissa Modica wrote:

Hello everybody,

I am sorry if my questions are too simple or not easily understandable. I?m
not  a native English speaker and this is my first analysis using this
function.

I have a problem with a cross correlation function and I would like to
understand how I have to perform it in R.

I have yearly data of an independent variable (x) from 1982 to 2010, and I
also have yearly data of a variable (y)from 1990 to 2010.

I think y could be influenced by the variable (x) with a delay of 6 years.

When I plot the data of x from 1986 to 2006 against the data of y from 1990
to 2010, the graphic has a opposite trend, i.e. when the variable x was
high in the 1986, the variable y was low in 1990 and so on until the end of
the time series.

Consequently I aspect that the two time series are correlated with a
negative correlation value.

  Namely:

Yyear=f(xyear-Lag).

And corr has a negative value.

I write here the script I have performed in R.

a)



x<-c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029
2,184.9683,

222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653)

y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794,

26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892)

x<-ts(x)

y<-ts(y)

dumb<-ccf( x,y, ylab = "cross-correlation",  xlab = "Time lag", main = "y
influenced by x")

dumb



Autocorrelations of series ?X?, by lag



    -10     -9     -8     -7     -6     -5     -4     -3     -2     -1

  0.083  0.133  0.253  0.323  0.386  0.515  0.544  0.609  0.448  0.118



0      1      2      3      4      5      6      7      8      9

-0.154 -0.283 -0.416 -0.326 -0.265 -0.217 -0.285 -0.340 -0.315 -0.254



10

-

0.188



My question is:

Is the script correct to ask the question I need to answer?

X and y have to heve the same length (i.e. I have to consider the same
number of years)?

What does this result means?

My interpretation is: the higher correlation was a lag of -3 years.

It means that what happened to ?x? variable in 1987 influenced ?y? in 1990?





Also, if it was not correct, is correct to write:

b)

c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029
2,184.9683,

222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653)

y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794,

26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892)

x<-ts(x)

y<-ts(y)

dumb<-ccf( x[3:23],y, ylab = "cross-correlation",  xlab = "Time lag", main
= "y influenced by x")



dumb



Autocorrelations of series ?X?, by lag



    -10     -9     -8     -7     -6     -5     -4     -3     -2     -1

  0.104  0.221  0.257  0.393  0.478  0.601  0.517  0.406  0.087 -0.270



0      1      2      3      4      5      6      7      8      9

-0.481 -0.397 -0.344 -0.241 -0.284 -0.349 -0.337 -0.265 -0.198 -0.161



10

0.044



As I understand this results mean that the higher correlation is observed
when the lag =0. That means a difference of 6 years that I set up when I
wrote x[3:23] that simply means work with years from 1984 to 2004.



In summary I would like to know:

1) if the analysis is correct in the way a) or in the way b)

2) if there is another way to demonstrate that the variable x have an
influence on the variable y with a delay of 6 years.



Thank very much to anybody  who could help me.

ccf (cross correlation function) problems

Thread (2 messages)