ccf (cross correlation function) problems
Your question and your English are just fine!
If I were you, I would not mess around with the ccf() function but
would attack the question "directly" using the cor.test() function, with
sub-vectors of your x vector. Personally I find the notion of "lag" in acf()
and ccf() highly confusing and I always make "parity errors" --- i.e. I get
things backwards!
Moreover, the ccf() function is throwing information away; it truncates
the x vector to have the same length as y, i.e. 21, and so never uses
x[22:29] --- which have useful content in respect of lags less than 8.
You haven't a lot of data, so it is prudent not to be wasteful.
What I would do:
OP <- par(mfrow=c(3,3))
for(i in 1:9) {
CT <- cor.test(x[i:(20+i)],y,alternative="less")
PV <- CT$p.value
cat("lag =",9-i,"p-value =",PV,"\n")
COR <- sprintf("%1.3f",CT$estimate)
plot(x[i:(20+i)],y,xlab="x",main=paste("lag =",9-i,"corr =",COR))
}
par(OP)
HTH
cheers,
Rolf Turner
On 01/29/2013 11:26 PM, Larissa Modica wrote:
Hello everybody,
I am sorry if my questions are too simple or not easily understandable. I?m
not a native English speaker and this is my first analysis using this
function.
I have a problem with a cross correlation function and I would like to
understand how I have to perform it in R.
I have yearly data of an independent variable (x) from 1982 to 2010, and I
also have yearly data of a variable (y)from 1990 to 2010.
I think y could be influenced by the variable (x) with a delay of 6 years.
When I plot the data of x from 1986 to 2006 against the data of y from 1990
to 2010, the graphic has a opposite trend, i.e. when the variable x was
high in the 1986, the variable y was low in 1990 and so on until the end of
the time series.
Consequently I aspect that the two time series are correlated with a
negative correlation value.
Namely:
Yyear=f(xyear-Lag).
And corr has a negative value.
I write here the script I have performed in R.
a)
x<-c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029
2,184.9683,
222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653)
y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794,
26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892)
x<-ts(x)
y<-ts(y)
dumb<-ccf( x,y, ylab = "cross-correlation", xlab = "Time lag", main = "y
influenced by x")
dumb
Autocorrelations of series ?X?, by lag
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0.083 0.133 0.253 0.323 0.386 0.515 0.544 0.609 0.448 0.118
0 1 2 3 4 5 6 7 8 9
-0.154 -0.283 -0.416 -0.326 -0.265 -0.217 -0.285 -0.340 -0.315 -0.254
10
-
0.188
My question is:
Is the script correct to ask the question I need to answer?
X and y have to heve the same length (i.e. I have to consider the same
number of years)?
What does this result means?
My interpretation is: the higher correlation was a lag of -3 years.
It means that what happened to ?x? variable in 1987 influenced ?y? in 1990?
Also, if it was not correct, is correct to write:
b)
c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029
2,184.9683,
222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653)
y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794,
26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892)
x<-ts(x)
y<-ts(y)
dumb<-ccf( x[3:23],y, ylab = "cross-correlation", xlab = "Time lag", main
= "y influenced by x")
dumb
Autocorrelations of series ?X?, by lag
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0.104 0.221 0.257 0.393 0.478 0.601 0.517 0.406 0.087 -0.270
0 1 2 3 4 5 6 7 8 9
-0.481 -0.397 -0.344 -0.241 -0.284 -0.349 -0.337 -0.265 -0.198 -0.161
10
0.044
As I understand this results mean that the higher correlation is observed
when the lag =0. That means a difference of 6 years that I set up when I
wrote x[3:23] that simply means work with years from 1984 to 2004.
In summary I would like to know:
1) if the analysis is correct in the way a) or in the way b)
2) if there is another way to demonstrate that the variable x have an
influence on the variable y with a delay of 6 years.
Thank very much to anybody who could help me.