An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130607/fa45ed5e/attachment.pl>
matched samples, dataframe, panel data
2 messages · Cecilia Carmo, arun
Hi, May be this helps: ?lst1<-split(final3,list(final3$year,final3$industry)) lst2<-lst1[lapply(lst1,nrow)>0] lst3<-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y< (x$dimension+x$dimension*0.1)) & (y> (x$dimension-x$dimension*0.1)),])) lst4<-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5<-lapply(lst4,function(x)x[!duplicated(x)]) lst6<-lst5[lapply(lst5,length)>0] names(lst6) # [1] "2000.20" "2001.20" "2002.20" "2003.20" "2004.20" "2001.30" "2002.30" ?#[8] "2001.40" "2002.40" "2003.40" "2004.40" lst6["2000.20"] #$`2000.20` #$`2000.20`[[1]] ?#? firm year industry dummy dimension #1???? 1 2000?????? 20???? 0????? 2120 #21??? 5 2000?????? 20???? 1????? 2189 # #$`2000.20`[[2]] ?#? firm year industry dummy dimension #16??? 4 2000?????? 20???? 0????? 3178 #31??? 7 2000?????? 20???? 1????? 3245 # #$`2000.20`[[3]] ?#? firm year industry dummy dimension #11??? 3 2000?????? 20???? 1????? 4532 #6???? 2 2000?????? 20???? 0????? 4890 A.K.
From: Cecilia Carmo <cecilia.carmo at ua.pt>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data
Again my problem, better explained.
#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)
#My reproducible example
firm1<-sort(rep(1:10,5),decreasing=F)
year1<-rep(2000:2004,10)
industry1<-rep(20,50)
dummy1<-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1<-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,6666,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1<-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)<-c("firm","year","industry","dummy","dimension")
firm2<-sort(rep(11:15,3),decreasing=F)
year2<-rep(2001:2003,5)
industry2<-rep(30,15)
dummy2<-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2<-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2<-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)<-c("firm","year","industry","dummy","dimension")
firm3<-sort(rep(16:20,4),decreasing=F)
year3<-rep(2001:2004,5)
industry3<-rep(40,20)
dummy3<-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3<-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3<-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3
colnames(data3)<-c("firm","year","industry","dummy","dimension")
final1<-rbind(data1,data2)
final2<-rbind(final1,data3)
final2
final3<-final2[order(final2$year,final2$industry,final2$dimension),]
final3
#So my data is final3 is?like this:
?? firm year industry dummy dimension
26??? 6 2000?????? 20???? 0?????? 781
1???? 1 2000?????? 20???? 0????? 2120
21??? 5 2000?????? 20???? 1????? 2189
36??? 8 2000?????? 20???? 1????? 2765
16??? 4 2000?????? 20???? 0????? 3178
31??? 7 2000?????? 20???? 1????? 3245
11??? 3 2000?????? 20???? 1????? 4532
6???? 2 2000?????? 20???? 0????? 4890
41??? 9 2000?????? 20???? 0????? 5438
46?? 10 2000?????? 20???? 0????? 7690
2???? 1 2001?????? 20???? 0?????? 345
37??? 8 2001?????? 20???? 1?????? 389
32??? 7 2001?????? 20???? 0????? 1234
17??? 4 2001?????? 20???? 0????? 2678
7???? 2 2001?????? 20???? 1????? 2789
22??? 5 2001?????? 20???? 1????? 4289
47?? 10 2001?????? 20???? 0????? 6022
12??? 3 2001?????? 20???? 1????? 6593
27??? 6 2001?????? 20???? 0???? 35489
42??? 9 2001?????? 20???? 1???? 37824
60?? 14 2001?????? 30???? 1????? 2341
54?? 12 2001?????? 30???? 0????? 2345
57?? 13 2001?????? 30???? 1????? 3245
51?? 11 2001?????? 30???? 0???? 12456
63?? 15 2001?????? 30???? 1???? 12900
78?? 19 2001?????? 40???? 1?????? 389
74?? 18 2001?????? 40???? 1????? 1288
82?? 20 2001?????? 40???? 0????? 6438
70?? 17 2001?????? 40???? 1????? 6754
66?? 16 2001?????? 40???? 0???? 23456
43??? 9 2002?????? 20???? 0??????? 23
33??? 7 2002?????? 20???? 1??????? 25
3???? 1 2002?????? 20???? 1????? 2341
28??? 6 2002?????? 20???? 0????? 2345
8???? 2 2002?????? 20???? 1????? 3412
48?? 10 2002?????? 20???? 1????? 3678
18??? 4 2002?????? 20???? 0????? 6666
23??? 5 2002?????? 20???? 0????? 8543
13??? 3 2002?????? 20???? 0???? 12900
38??? 8 2002?????? 20???? 1???? 23456
64?? 15 2002?????? 30???? 0?????? 123
52?? 11 2002?????? 30???? 0?????? 781
58?? 13 2002?????? 30???? 1????? 2120
61?? 14 2002?????? 30???? 1????? 5678
55?? 12 2002?????? 30???? 0????? 5754
67?? 16 2002?????? 40???? 0????? 1181
75?? 18 2002?????? 40???? 1????? 1200
71?? 17 2002?????? 40???? 0????? 8976
79?? 19 2002?????? 40???? 0???? 23456
83?? 20 2002?????? 40???? 1???? 24824
14??? 3 2003?????? 20???? 0?????? 123
24??? 5 2003?????? 20???? 0?????? 637
19??? 4 2003?????? 20???? 1?????? 647
34??? 7 2003?????? 20???? 0????? 1200
39??? 8 2003?????? 20???? 1????? 2367
44??? 9 2003?????? 20???? 0????? 2897
4???? 1 2003?????? 20???? 1????? 5678
29??? 6 2003?????? 20???? 0????? 5754
49?? 10 2003?????? 20???? 1????? 9431
9???? 2 2003?????? 20???? 0????? 9500
59?? 13 2003?????? 30???? 1?????? 345
65?? 15 2003?????? 30???? 1????? 2345
56?? 12 2003?????? 30???? 0????? 8976
62?? 14 2003?????? 30???? 1???? 10900
53?? 11 2003?????? 30???? 0???? 32489
84?? 20 2003?????? 40???? 0??????? 23
76?? 18 2003?????? 40???? 1????? 2345
80?? 19 2003?????? 40???? 0????? 2367
72?? 17 2003?????? 40???? 1????? 3245
68?? 16 2003?????? 40???? 1???? 32489
15??? 3 2004?????? 20???? 0????? 2345
35??? 7 2004?????? 20???? 1????? 2345
50?? 10 2004?????? 20???? 1????? 2890
45??? 9 2004?????? 20???? 0????? 3456
40??? 8 2004?????? 20???? 0????? 3892
10??? 2 2004?????? 20???? 1????? 8765
30??? 6 2004?????? 20???? 0????? 8976
5???? 1 2004?????? 20???? 0???? 10900
25??? 5 2004?????? 20???? 0???? 23456
20??? 4 2004?????? 20???? 1???? 23789
73?? 17 2004?????? 40???? 0????? 1234
69?? 16 2004?????? 40???? 0????? 2345
77?? 18 2004?????? 40???? 1????? 2765
85?? 20 2004?????? 40???? 0????? 2897
81?? 19 2004?????? 40???? 0????? 3892
I want to keep couples of firms one with dummy=1 and other with dummy=0 that matchs in industry, firm and dimension.
But dimension doesn't need to be exactly the same, it is why I refer an interval of + or - 10%.
For example firm 1 matchs with firm 5, because they have the same year, industry, dimension (10% x 2120 = 212 and 2189-2120<212)
and firm 1 is dummy=0 and firm 5 is dummy=1.
So I want to delete firm 6 because it doesn't macth with any firm, and keep firm 1 and 5.
???? firm year industry dummy dimension
26??? 6 2000?????? 20???? 0?????? 781
1???? 1 2000?????? 20???? 0????? 2120
21??? 5 2000?????? 20???? 1????? 2189
Next,
Now I can match firm 4 with firm 7 and delete firm 8.
36??? 8 2000?????? 20???? 1????? 2765
16??? 4 2000?????? 20???? 0????? 3178
31??? 7 2000?????? 20???? 1????? 3245
And so on...
At the end I want to keep only pairs of firms, matched by year, industry and dimension.
If I separate firms with dummy=1 from firms with?dummy=0 in two separated dataframes, I have two matched samples
with the same number of observations. That's what I want.
Thank you,
Cec?lia Carmo
Universidade de Aveiro - Portugal?????????