Suppose we observe N individuals, for each of which we have a
time-series. How do we correctly create a lagged value of the
time-series variable?
As an example, suppose I create:
A <- data.frame(year=rep(c(1980:1984),3),
person= factor(sort(rep(1:3,5))),
wage=c(rnorm(15)))
> A
year person wage
1 1980 1 0.17923212
2 1981 1 0.25610292
3 1982 1 0.50833655
4 1983 1 -0.42448395
5 1984 1 0.49233532
6 1980 2 -0.49928025
7 1981 2 0.06842660
8 1982 2 0.65677575
9 1983 2 0.15947390
10 1984 2 -0.46585116
11 1980 3 -0.29052635
12 1981 3 -0.27109203
13 1982 3 -0.76168164
14 1983 3 0.02294361
15 1984 3 2.22828032
What I'd like to do is to make a lagged wage for each person, i.e., I
should get an additional variable A$wage.lag1:
> A
year person wage wage.lag1
1 1980 1 0.17923212 NA
2 1981 1 0.25610292 0.17923212
3 1982 1 0.50833655 0.25610292
4 1983 1 -0.42448395 0.50833655
5 1984 1 0.49233532 -0.42448395
6 1980 2 -0.49928025 NA
7 1981 2 0.06842660 -0.49928025
8 1982 2 0.65677575 0.06842660
9 1983 2 0.15947390 0.65677575
10 1984 2 -0.46585116 0.15947390
11 1980 3 -0.29052635 NA
12 1981 3 -0.27109203 -0.29052635
13 1982 3 -0.76168164 -0.27109203
14 1983 3 0.02294361 -0.76168164
15 1984 3 2.22828032 0.02294361
I could think of writing code which does this "by hand", but it struck
me as a fundamental requirement when dealing with panel data, so
perhaps there is high level support for such a task?
I have been trying to learn groupedData objects and the tools that go
with them, but I didn't get a hint about how I would address such a
task.
-Ila
How to make a lagged variable in panel data?
2 messages · Ila Patnaik, Gabor Grothendieck
On 8/13/05, Ila Patnaik <ila at mayin.org> wrote:
Suppose we observe N individuals, for each of which we have a
time-series. How do we correctly create a lagged value of the
time-series variable?
As an example, suppose I create:
A <- data.frame(year=rep(c(1980:1984),3),
person= factor(sort(rep(1:3,5))),
wage=c(rnorm(15)))
> A
year person wage 1 1980 1 0.17923212 2 1981 1 0.25610292 3 1982 1 0.50833655 4 1983 1 -0.42448395 5 1984 1 0.49233532 6 1980 2 -0.49928025 7 1981 2 0.06842660 8 1982 2 0.65677575 9 1983 2 0.15947390 10 1984 2 -0.46585116 11 1980 3 -0.29052635 12 1981 3 -0.27109203 13 1982 3 -0.76168164 14 1983 3 0.02294361 15 1984 3 2.22828032 What I'd like to do is to make a lagged wage for each person, i.e., I should get an additional variable A$wage.lag1:
> A
year person wage wage.lag1 1 1980 1 0.17923212 NA 2 1981 1 0.25610292 0.17923212 3 1982 1 0.50833655 0.25610292 4 1983 1 -0.42448395 0.50833655 5 1984 1 0.49233532 -0.42448395 6 1980 2 -0.49928025 NA 7 1981 2 0.06842660 -0.49928025 8 1982 2 0.65677575 0.06842660 9 1983 2 0.15947390 0.65677575 10 1984 2 -0.46585116 0.15947390 11 1980 3 -0.29052635 NA 12 1981 3 -0.27109203 -0.29052635 13 1982 3 -0.76168164 -0.27109203 14 1983 3 0.02294361 -0.76168164 15 1984 3 2.22828032 0.02294361
We can use 'by' to split data frame A by person and to
apply the function f to each such subset of rows. Function f
makes that portion of wage which corresponds to a single
person into a ts class time series so that we can use lag
with it and then we cbind wage together with its lag. From
the cbind'ed result we extract out those times that
correspond to the original series since the example output
only includes those. Note that such extraction has a side
effect of turning wages into a matrix rather than a time
series. We then put every together using cbind(...) once
again and once the 'by' is complete we rbind all rows together.
f <- function(x) {
wage <- ts(x$wage, start = x$year[1])
idx <- seq(length = length(wage))
wages <- cbind(wage, lag(wage, -1))[idx,]
cbind(x[,1:2], wages)
}
result <- do.call("rbind", by(A, A$person, f))
result