How to count number of year per firm in panel data?
ave is also handy here: DF$number <- ave(DF$year, DF$id, FUN = length) On Wed, Feb 11, 2009 at 12:38 PM, Marc Schwartz
<marc_schwartz at comcast.net> wrote:
on 02/11/2009 10:43 AM Johannes Habel wrote:
Hello, I have an unbalanced panel dataset and would like to exclude all objects that don't appear at least x times. Therefore, I would like to include a column indicating for every line how many periods are available, e.g. id, year, number 1, 2000, 3 1, 2001, 3 1, 2002, 3 2, 2001, 1 3, ..., ... This would allow me to exclude companies by setting "subset=number>=x". However, I don't know how to create this column, i.e. how to count the years for each object and include the numbers into the dataset. Could anybody help me, please? Alternatively, is there an easier way to achieve my goal? Thank you very much. Johannes Habel
You don't need to add the extra column. You can just create a frequency table of the unique 'id' values, get the subset of values that meet your count criteria and then use those values in subset(). Let's create a little larger dataset: set.seed(1) id <- sample(letters[1:4], 8, replace = TRUE)
id
[1] "b" "b" "c" "d" "a" "d" "d" "c"
years <- unlist(lapply(split(id, id),
function(i) 2000:(2000 + length(i) - 1)))
years
a b1 b2 c1 c2 d1 d2 d3 2000 2000 2001 2000 2001 2000 2001 2002 DF <- data.frame(id = sort(id), year = years)
DF
id year a a 2000 b1 b 2000 b2 b 2001 c1 c 2000 c2 c 2001 d1 d 2000 d2 d 2001 d3 d 2002
subset(DF, id %in% names(which(table(DF$id) >= 3)))
id year d1 d 2000 d2 d 2001 d3 d 2002 Step by step:
table(DF$id)
a b c d 1 2 2 3
table(DF$id) >= 3
a b c d FALSE FALSE FALSE TRUE
which(table(DF$id) >= 3)
d 4
names(which(table(DF$id) >= 3))
[1] "d" Then use subset() as above, filtering only those id's that are in the names from the table. If we change the requirement to >= 2:
subset(DF, id %in% names(which(table(DF$id) >= 2)))
id year b1 b 2000 b2 b 2001 c1 c 2000 c2 c 2001 d1 d 2000 d2 d 2001 d3 d 2002 See ?table, ?names, ?which and ?"%in%" If you really need to add the column, you could use aggregate() to get a count of years for each id as a data frame, then use merge() to add the column to DF:
aggregate(DF$year, list(id = DF$id), length)
id x 1 a 1 2 b 2 3 c 2 4 d 3
merge(DF, aggregate(DF$year, list(id = DF$id), length), by = "id")
id year x 1 a 2000 1 2 b 2000 2 3 b 2001 2 4 c 2000 2 5 c 2001 2 6 d 2000 3 7 d 2001 3 8 d 2002 3 Then use subset() as you initially considered. See ?aggregate and ?merge HTH, Marc Schwartz
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.