How to count number of year per firm in panel data?

ave is also handy here:

DF$number <- ave(DF$year, DF$id, FUN = length)

On Wed, Feb 11, 2009 at 12:38 PM, Marc Schwartz
on 02/11/2009 10:43 AM Johannes Habel wrote:
Hello,

I have an unbalanced panel dataset and would like to exclude all objects
that don't appear at least x times.

Therefore, I would like to include a column indicating for every line how
many periods are available, e.g.

id, year, number
1, 2000, 3
1, 2001, 3
1, 2002, 3
2, 2001, 1
3, ..., ...

This would allow me to exclude companies by setting "subset=number>=x".
However, I don't know how to create this column, i.e. how to count the years
for each object and include the numbers into the dataset.

Could anybody help me, please?

Alternatively, is there an easier way to achieve my goal?

Thank you very much.

Johannes Habel
You don't need to add the extra column. You can just create a frequency
table of the unique 'id' values, get the subset of values that meet your
count criteria and then use those values in subset().

Let's create a little larger dataset:

set.seed(1)
id <- sample(letters[1:4], 8, replace = TRUE)

id
[1] "b" "b" "c" "d" "a" "d" "d" "c"

years <- unlist(lapply(split(id, id),
                      function(i) 2000:(2000 + length(i) - 1)))

years
  a   b1   b2   c1   c2   d1   d2   d3
2000 2000 2001 2000 2001 2000 2001 2002

DF <- data.frame(id = sort(id), year = years)

DF
  id year
a   a 2000
b1  b 2000
b2  b 2001
c1  c 2000
c2  c 2001
d1  d 2000
d2  d 2001
d3  d 2002

subset(DF, id %in% names(which(table(DF$id) >= 3)))
  id year
d1  d 2000
d2  d 2001
d3  d 2002

Step by step:

table(DF$id)
a b c d
1 2 2 3

table(DF$id) >= 3
   a     b     c     d
FALSE FALSE FALSE  TRUE

which(table(DF$id) >= 3)
d
4

names(which(table(DF$id) >= 3))
[1] "d"

Then use subset() as above, filtering only those id's that are in the
names from the table. If we change the requirement to >= 2:

subset(DF, id %in% names(which(table(DF$id) >= 2)))
  id year
b1  b 2000
b2  b 2001
c1  c 2000
c2  c 2001
d1  d 2000
d2  d 2001
d3  d 2002

See ?table, ?names, ?which and ?"%in%"

If you really need to add the column, you could use aggregate() to get a
count of years for each id as a data frame, then use merge() to add the
column to DF:

aggregate(DF$year, list(id = DF$id), length)
 id x
1  a 1
2  b 2
3  c 2
4  d 3

merge(DF, aggregate(DF$year, list(id = DF$id), length), by = "id")
 id year x
1  a 2000 1
2  b 2000 2
3  b 2001 2
4  c 2000 2
5  c 2001 2
6  d 2000 3
7  d 2001 3
8  d 2002 3

Then use subset() as you initially considered.

See ?aggregate and ?merge

HTH,

Marc Schwartz

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

How to count number of year per firm in panel data?

Thread (3 messages)