Skip to content
Prev 170130 / 398506 Next

How to count number of year per firm in panel data?

on 02/11/2009 10:43 AM Johannes Habel wrote:
You don't need to add the extra column. You can just create a frequency
table of the unique 'id' values, get the subset of values that meet your
count criteria and then use those values in subset().

Let's create a little larger dataset:

set.seed(1)
id <- sample(letters[1:4], 8, replace = TRUE)
[1] "b" "b" "c" "d" "a" "d" "d" "c"


years <- unlist(lapply(split(id, id),
                       function(i) 2000:(2000 + length(i) - 1)))
a   b1   b2   c1   c2   d1   d2   d3
2000 2000 2001 2000 2001 2000 2001 2002


DF <- data.frame(id = sort(id), year = years)
id year
a   a 2000
b1  b 2000
b2  b 2001
c1  c 2000
c2  c 2001
d1  d 2000
d2  d 2001
d3  d 2002
id year
d1  d 2000
d2  d 2001
d3  d 2002


Step by step:
a b c d
1 2 2 3
a     b     c     d
FALSE FALSE FALSE  TRUE
d
4
[1] "d"


Then use subset() as above, filtering only those id's that are in the
names from the table. If we change the requirement to >= 2:
id year
b1  b 2000
b2  b 2001
c1  c 2000
c2  c 2001
d1  d 2000
d2  d 2001
d3  d 2002


See ?table, ?names, ?which and ?"%in%"


If you really need to add the column, you could use aggregate() to get a
count of years for each id as a data frame, then use merge() to add the
column to DF:
id x
1  a 1
2  b 2
3  c 2
4  d 3
id year x
1  a 2000 1
2  b 2000 2
3  b 2001 2
4  c 2000 2
5  c 2001 2
6  d 2000 3
7  d 2001 3
8  d 2002 3

Then use subset() as you initially considered.

See ?aggregate and ?merge

HTH,

Marc Schwartz