Skip to content
Prev 146390 / 398500 Next

Improving data processing efficiency

Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue 
me in as to the best posting procedure!

well, here's a dput-ed version of the small data subset you can use for 
testing. below that, an updated version of the function, with extra 
explanatory comments, and producing an extra column showing exactly what 
is matched to what.

to test, just run the function, with the dataset as sole argument.

Thanks again; i'd appreciate any input on this.

=========== begin dataset dput representation ============

structure(list(PERMNO = c(10001L, 10001L, 10298L, 10298L, 10484L,
10484L, 10515L, 10515L, 10634L, 10634L, 11048L, 11048L, 11237L,
11294L, 11294L, 11338L, 11338L, 11404L, 11404L, 11587L, 11587L,
11591L, 11591L, 11737L, 11737L, 11791L, 11809L, 11809L, 11858L,
11858L, 11955L, 11955L, 12003L, 12003L, 12016L, 12016L, 12223L,
12223L, 12758L, 12758L, 13688L, 13688L, 16117L, 16117L, 17770L,
17770L, 21514L, 21514L, 21792L, 21792L, 21821L, 21821L, 22437L,
22437L, 22947L, 22947L, 23027L, 23027L, 23182L, 23182L, 23536L,
23536L, 23712L, 23712L, 24053L, 24053L, 24117L, 24117L, 24256L,
24256L, 24299L, 24299L, 24352L, 24352L, 24379L, 24379L, 24467L,
24467L, 24679L, 24679L, 24870L, 24870L, 25056L, 25056L, 25208L,
25208L, 25232L, 25232L, 25241L, 25590L, 25590L, 26463L, 26463L,
26470L, 26470L, 26614L, 26614L, 27385L, 27385L, 29196L, 29196L,
30411L, 30411L, 32943L, 32943L, 38893L, 38893L, 40708L, 40708L,
41005L, 41005L, 42817L, 42817L, 42833L, 42833L, 43668L, 43668L,
45947L, 45947L, 46017L, 46017L, 48274L, 48274L, 49971L, 49971L,
53786L, 53786L, 53859L, 53859L, 54199L, 54199L, 56371L, 56952L,
56952L, 57277L, 57277L, 57381L, 57381L, 58202L, 58202L, 59395L,
59395L, 59935L, 60169L, 60169L, 61188L, 61188L, 61444L, 61444L,
62690L, 62690L, 62842L, 62842L, 64290L, 64290L, 64418L, 64418L,
64450L, 64450L, 64477L, 64477L, 64557L, 64557L, 64646L, 64646L,
64902L, 64902L, 67774L, 67774L, 68910L, 68910L, 70471L, 70471L,
74406L, 74406L, 75091L, 75091L, 75304L, 75304L, 75743L, 75964L,
75964L, 76026L, 76026L, 76162L, 76170L, 76173L, 78530L, 78530L,
78682L, 78682L, 81569L, 81569L, 82502L, 82502L, 83337L, 83337L,
83919L, 83919L, 88242L, 88242L, 90852L, 90852L, 91353L, 91353L
), DATE = c(19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630,
19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630,
19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331,
19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630,
19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331,
19900630, 19900331, 19900331, 19900630, 19900331, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331,
19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630,
19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630), Shares.Owned = c(50100,
50100, 250000, 293500, 3656629, 3827119, 4132439, 3566591, 2631193,
2500301, 775879, 816879, 38700, 1041600, 1070300, 533768, 558815,
61384492, 60466567, 194595, 196979, 359946, 314446, 106770, 107070,
20242, 1935098, 2099403, 1902125, 1766750, 41991, 41991, 34490,
36290, 589400, 596700, 1549395, 1759440, 854473, 762903, 156366785,
98780287, 2486389, 2635718, 122264, 122292, 25455916, 25458658,
71645490, 71855722, 30969596, 30409838, 2738576, 2814490, 20846605,
20930233, 1148299, 505415, 396388, 385714, 25239923, 24117950,
73465526, 73084616, 8096614, 7595742, 3937930, 3820215, 20884821,
19456342, 2127331, 2188276, 2334515, 2813347, 8267264, 8544084,
783277, 810742, 742048, 512956, 9659658, 9436873, 40107717, 41234384,
9111755, 9708782, 12815719, 13144148, 1146100, 8292392, 8271030,
282650, 281273, 4196126, 4273758, 2489363, 2734182, 1579681,
1369192, 51947585, 51941430, 54673, 52585, 317601, 314876, 62626258,
63341772, 8977553, 8940106, 4478872, 4315631, 1246339, 1227442,
68484747, 68041081, 22679902, 21775270, 927147, 936881, 2626449,
2245552, 14029366, 14304855, 2434123, 2184358, 77479654, 81754241,
333070, 282967, 241146, 256146, 11419, 819092, 798490, 1403179,
1326018, 238974451, 237684105, 1889699, 2317096, 4887641, 5972387,
3567239, 1024595, 993627, 5254732, 5459404, 413146, 432697, 5307595,
4813261, 7717872, 8689444, 2431341, 2372096, 909359, 868068,
2110670, 2055349, 23774859, 23573345, 4234466, 4143534, 1192314,
1255105, 3052000, 2605700, 5566270, 5972761, 1470173, 1448403,
28065345, 32961737, 1844441, 2247991, 651758, 655658, 65864806,
82392617, 1942906, 14800, 14657, 6600, 5534, 394064, 163000,
2499320, 1123624, 1227987, 198000, 241000, 3681688, 3409586,
2416988, 2407798, 55081, 48091, 480000, 785710, 1040147, 1171854,
1363994, 1555229, 199237, 192637), Shares.Outstanding.13f = c(1,
1, 7, 7, 8, 8, 8, 8, 6, 6, 8, 8, 4, 4, 4, 18, 19, 228, 228, 2,
2, 3, 3, 5, 5, 7, 9, 9, 6, 6, 2, 2, 3, 3, 7, 7, 14, 15, 3, 3,
429, 429, 17, 16, 2, 2, 43, 41, 127, 126, 86, 86, 15, 15, 51,
51, 7, 7, 3, 3, 67, 67, 211, 211, 35, 35, 14, 14, 49, 49, 12,
12, 22, 22, 31, 31, 4, 4, 4, 5, 34, 34, 64, 64, 56, 56, 27, 27,
47, 28, 28, 2, 2, 10, 10, 8, 8, 13, 13, 87, 87, 1, 1, 3, 3, 101,
101, 38, 36, 49, 56, 22, 22, 245, 247, 36, 35, 6, 6, 22, 22,
30, 30, 11, 11, 151, 151, 2, 2, 3, 3, 4, 4, 4, 10, 10, 468, 459,
10, 10, 16, 16, 27, 8, 8, 19, 19, 3, 3, 7, 7, 15, 15, 6, 6, 6,
6, 13, 13, 60, 60, 11, 11, 10, 10, 8, 8, 153, 152, 7, 7, 206,
206, 5, 5, 4, 4, 246, 299, 4, 0, 0, 13, 13, 7, 5, 10, 7, 7, 11,
11, 16, 16, 6, 6, 1, 1, 7, 7, 10, 10, 5, 5, 10, 10), Percent.Inst.Owned 
= c(0.0501,
0.0501, 0.0357142857142857, 0.0419285714285714, 0.457078625,
0.478389875, 0.516554875, 0.445823875, 0.438532166666667, 
0.416716833333333,
0.096984875, 0.102109875, 0.009675, 0.2604, 0.267575, 0.0296537777777778,
0.0294113157894737, 0.269230228070175, 0.26520424122807, 0.0972975,
0.0984895, 0.119982, 0.104815333333333, 0.021354, 0.021414, 
0.00289171428571429,
0.215010888888889, 0.233267, 0.317020833333333, 0.294458333333333,
0.0209955, 0.0209955, 0.0114966666666667, 0.0120966666666667,
0.0842, 0.0852428571428571, 0.110671071428571, 0.117296, 0.284824333333333,
0.254301, 0.36449134032634, 0.230257079254079, 0.146258176470588,
0.164732375, 0.061132, 0.061146, 0.591998046511628, 0.62094287804878,
0.564137716535433, 0.570283507936508, 0.360111581395349, 0.353602767441860,
0.182571733333333, 0.187632666666667, 0.408756960784314, 0.410396725490196,
0.164042714285714, 0.0722021428571429, 0.132129333333333, 
0.128571333333333,
0.376715268656716, 0.359969402985075, 0.348177848341232, 0.346372587677725,
0.231331828571429, 0.2170212, 0.281280714285714, 0.2728725, 
0.426220836734694,
0.397068204081633, 0.177277583333333, 0.182356333333333, 0.106114318181818,
0.127879409090909, 0.266685935483871, 0.275615612903226, 0.19581925,
0.2026855, 0.185512, 0.1025912, 0.284107588235294, 0.277555088235294,
0.626683078125, 0.64428725, 0.162709910714286, 0.173371107142857,
0.474656259259259, 0.486820296296296, 0.0243851063829787, 
0.296156857142857,
0.295393928571429, 0.141325, 0.1406365, 0.4196126, 0.4273758,
0.311170375, 0.34177275, 0.121513923076923, 0.105322461538462,
0.59709867816092, 0.597027931034483, 0.054673, 0.052585, 0.105867,
0.104958666666667, 0.62006196039604, 0.627146257425743, 0.236251394736842,
0.248336277777778, 0.0914055510204082, 0.0770648392857143, 
0.0566517727272727,
0.0557928181818182, 0.279529579591837, 0.275469963562753, 
0.629997277777778,
0.622150571428571, 0.1545245, 0.156146833333333, 0.119384045454545,
0.102070545454545, 0.467645533333333, 0.4768285, 0.221283909090909,
0.198578, 0.513110291390729, 0.541418814569536, 0.166535, 0.1414835,
0.080382, 0.085382, 0.00285475, 0.204773, 0.1996225, 0.1403179,
0.1326018, 0.510629168803419, 0.517830294117647, 0.1889699, 0.2317096,
0.3054775625, 0.3732741875, 0.132119962962963, 0.128074375, 0.124203375,
0.276564842105263, 0.287337052631579, 0.137715333333333, 0.144232333333333,
0.758227857142857, 0.687608714285714, 0.5145248, 0.579296266666667,
0.4052235, 0.395349333333333, 0.151559833333333, 0.144678, 
0.162359230769231,
0.158103769230769, 0.39624765, 0.392889083333333, 0.384951454545455,
0.376684909090909, 0.1192314, 0.1255105, 0.3815, 0.3257125, 
0.0363808496732026,
0.0392944802631579, 0.210024714285714, 0.206914714285714, 
0.136239538834951,
0.160008432038835, 0.3688882, 0.4495982, 0.1629395, 0.1639145,
0.267743113821138, 0.275560591973244, 0.4857265, Inf, Inf, 
0.000507692307692308,
0.000425692307692308, 0.0562948571428571, 0.0326, 0.249932, 
0.160517714285714,
0.175426714285714, 0.018, 0.0219090909090909, 0.2301055, 0.213099125,
0.402831333333333, 0.401299666666667, 0.055081, 0.048091, 
0.0685714285714286,
0.112244285714286, 0.1040147, 0.1171854, 0.2727988, 0.3110458,
0.0199237, 0.0192637), Latest.Issue.Date.ByPERMNO = c(19860108,
19860108, 19600101, 19600101, 19600101, 19600101, 19870728, 19870728,
19870501, 19870501, 19870805, 19870805, 19600101, 19600101, 19600101,
19600101, 19600101, 19730523, 19730523, 19600101, 19600101, 19870811,
19870811, 19870930, 19870930, 19600101, 19880729, 19880729, 19880225,
19880225, 19880602, 19880602, 19860610, 19860610, 19880802, 19880802,
19890629, 19890629, 19600101, 19600101, 19821109, 19821109, 19860619,
19860619, 19871117, 19871117, 19600101, 19600101, 19890308, 19890308,
19900208, 19900208, 19861120, 19861120, 19880803, 19880803, 19600101,
19600101, 19890216, 19890216, 19761202, 19761202, 19890919, 19890919,
19810623, 19810623, 19770615, 19770615, 19831004, 19831004, 19830616,
19830616, 19810519, 19810519, 19850311, 19850311, 19781130, 19781130,
19841016, 19900515, 19800904, 19800904, 19830825, 19830825, 19830601,
19830601, 19811110, 19811110, 19600101, 19890309, 19890309, 19850529,
19850529, 19881122, 19881122, 19840620, 19840620, 19740305, 19740305,
19860718, 19860718, 19600101, 19600101, 19860207, 19860207, 19891003,
19891003, 19870403, 19870403, 19600101, 19600101, 19790403, 19790403,
19850528, 19850528, 19830322, 19830322, 19761202, 19761202, 19841114,
19841114, 19800826, 19800826, 19880517, 19880517, 19860516, 19860516,
19891122, 19891122, 19600101, 19600101, 19600101, 19871119, 19871119,
19760624, 19760624, 19851206, 19851206, 19890615, 19890615, 19860805,
19860805, 19600101, 19890919, 19890919, 19860501, 19860501, 19600101,
19600101, 19890308, 19890308, 19900125, 19900125, 19890714, 19890714,
19880412, 19880412, 19890809, 19890809, 19870306, 19870306, 19751112,
19751112, 19870604, 19870604, 19810625, 19810625, 19600101, 19600101,
19860416, 19860416, 19891027, 19891027, 19890125, 19890125, 19860502,
19860502, 19600101, 19600101, 19900405, 19600101, 19600101, 19600101,
19600101, 19900412, 19900514, 19900518, 19890518, 19890518, 19600101,
19600101, 19900117, 19900117, 19891214, 19891214, 19600101, 19600101,
19600101, 19600101, 19851206, 19851206, 19851211, 19851211, 19600101,
19600101), Quarters.Since.19800101 = c(41L, 42L, 42L, 41L, 41L,
42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 41L, 41L, 42L, 41L,
42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 41L,
42L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L,
41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 41L, 42L, 42L,
41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L,
41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L,
42L, 41L, 42L, 42L, 41L, 41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L,
42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L,
41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L,
41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 42L, 41L, 42L, 42L,
41L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L,
42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L,
42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 41L, 42L, 42L,
41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 42L,
41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L,
42L, 41L, 42L, 41L, 42L), Quarters.Since.Latest.Issue = c(17L,
18L, 122L, 121L, 121L, 122L, 11L, 12L, 12L, 13L, 11L, 12L, 122L,
122L, 121L, 121L, 122L, 68L, 69L, 121L, 122L, 12L, 11L, 11L,
10L, 122L, 8L, 7L, 10L, 9L, 8L, 9L, 17L, 16L, 8L, 7L, 4L, 5L,
121L, 122L, 30L, 31L, 16L, 17L, 10L, 11L, 122L, 121L, 6L, 5L,
2L, 1L, 15L, 14L, 7L, 8L, 122L, 121L, 5L, 6L, 55L, 54L, 3L, 4L,
36L, 37L, 53L, 52L, 26L, 27L, 28L, 29L, 37L, 36L, 21L, 22L, 46L,
47L, 22L, 1L, 39L, 40L, 27L, 28L, 28L, 29L, 35L, 34L, 121L, 5L,
6L, 21L, 20L, 6L, 7L, 24L, 25L, 66L, 65L, 15L, 16L, 121L, 122L,
18L, 17L, 3L, 2L, 13L, 12L, 121L, 122L, 44L, 45L, 20L, 21L, 29L,
30L, 54L, 55L, 22L, 23L, 40L, 39L, 8L, 9L, 16L, 17L, 3L, 2L,
121L, 122L, 122L, 10L, 11L, 57L, 56L, 19L, 18L, 5L, 4L, 15L,
16L, 121L, 3L, 4L, 16L, 17L, 121L, 122L, 6L, 5L, 1L, 2L, 3L,
4L, 8L, 9L, 3L, 4L, 13L, 14L, 59L, 58L, 12L, 13L, 36L, 37L, 122L,
121L, 17L, 16L, 2L, 3L, 6L, 5L, 16L, 17L, 121L, 122L, 1L, 121L,
122L, 121L, 122L, 1L, 1L, 1L, 4L, 5L, 121L, 122L, 1L, 2L, 3L,
2L, 121L, 122L, 121L, 122L, 18L, 19L, 18L, 19L, 121L, 122L),
     ALTPRC = c(9.9375, 9.875, 0.45313, 0.67188, 7.875, 10, 18,
     22, 14.75, 9.75, 0.375, 0.15625, 3.9375, 16, 14.25, 7, 7.125,
     27.25, 23.375, 10.75, 13, 3.125, 3.125, 2.6875, 3.4375, 0.5,
     8.75, 7, 16.875, 12.375, 2.40625, 3.96875, 4, 4.625, 4.5,
     5.125, 26.25, 28.75, 4.5, 5.5, 21.75, 23.25, 15, 14.375,
     16.625, 14, 50.5, 48.75, 31.875, 33.125, 41.5, 46, 21, 22.125,
     30.75, 30.125, 10.375, 5.5, 11.5, 11, 29, 28.875, 27.25,
     26.75, 22.375, 22.25, 33.375, 35, 21, 19.75, 29.875, 28.875,
     22.125, 20.125, 21, 18.875, 24.625, 26.75, 21.75, 22, 22.125,
     21.125, 24.75, 26.75, 42.75, 43.5, 13.375, 29.625, 0.07813,
     25.125, 23.75, 18, 20, 17.5, 18.125, 18.875, 19, 28.875,
     30, 23.875, 23.625, 15.5, 15.625, 17.5, 19.5, 34.75, 30.75,
     2, 2.25, 18.625, 17.5, 21.375, 19.875, 45.25, 20.125, 37.25,
     41.75, 32.25, 32.5, 23.125, 21.875, 35.25, 38.75, 27.875,
     27.375, 35.875, 42.125, 24.25, 24.5, 25.125, 23.875, 2.0625,
     16.75, 16.25, 34.625, 37.75, 40, 31.625, 19.375, 20, 30.875,
     29.375, 0.125, 17.625, 17, 16.625, 17.75, 12.625, 13.25,
     26, 19.75, 15.25, 18.625, 18.125, 18, 16.375, 15.625, 18.5,
     19, 12.875, 14.375, 32.375, 33.375, 16.375, 16.375, 1.625,
     2.8125, 13.875, 14.625, 4.625, 4.5, 18.5, 24.125, 6.375,
     5.875, 10.625, 11.625, 6.625, 7.375, 14.75, 0.8125, 0.6875,
     2.125, 2.375, 20.25, 7.625, 34, 15.25, 15, 2.09375, 2.375,
     19.5, 18.125, 38.5, 30.75, 36, 35.75, 9.375, 11.25, 21.25,
     18.625, 6, 5.25, 1.15625, 1.25), HSICIG = c(492, 492, 494,
     494, 495, 495, 495, 495, 495, 495, 493, 493, 495, 495, 495,
     495, 495, 493, 493, 492, 492, 495, 495, 495, 495, 495, 495,
     495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495,
     495, 493, 493, 494, 494, 492, 492, 492, 492, 493, 493, 492,
     492, 493, 493, 493, 493, 495, 495, 492, 492, 493, 493, 493,
     493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493,
     493, 493, 493, 492, 492, 493, 493, 492, 492, 493, 493, 492,
     492, 495, 492, 492, 494, 494, 492, 492, 492, 492, 493, 493,
     492, 492, 494, 494, 492, 492, 492, 492, 492, 492, 492, 492,
     493, 493, 493, 493, 492, 492, 493, 493, 493, 493, 492, 492,
     492, 492, 495, 495, 494, 494, 494, 494, 495, 492, 492, 493,
     493, 495, 495, 492, 492, 492, 492, 495, 492, 492, 492, 492,
     492, 492, 492, 492, 494, 494, 492, 492, 492, 492, 492, 492,
     495, 495, 493, 493, 492, 492, 495, 495, 492, 492, 492, 492,
     495, 495, 495, 495, 495, 495, 492, 492, 495, 495, 495, 492,
     492, 495, 495, 495, 492, 492, 494, 494, 492, 492, 495, 495,
     492, 492, 493, 493, 494, 494, 495, 495, 495, 495), Market.Cap.13f = 
c(9937500,
     9875000, 3171910, 4703160, 6.3e+07, 8e+07, 1.44e+08, 1.76e+08,
     88500000, 58500000, 3e+06, 1250000, 15750000, 6.4e+07, 5.7e+07,
     1.26e+08, 135375000, 6.213e+09, 5329500000, 21500000, 2.6e+07,
     9375000, 9375000, 13437500, 17187500, 3500000, 78750000,
     6.3e+07, 101250000, 74250000, 4812500, 7937500, 1.2e+07,
     13875000, 31500000, 35875000, 367500000, 431250000, 13500000,
     16500000, 9330750000, 9974250000, 2.55e+08, 2.3e+08, 33250000,
     2.8e+07, 2171500000, 1998750000, 4048125000, 4173750000,
     3.569e+09, 3.956e+09, 3.15e+08, 331875000, 1568250000, 1536375000,
     72625000, 38500000, 34500000, 3.3e+07, 1.943e+09, 1934625000,
     5749750000, 5644250000, 783125000, 778750000, 467250000,
     4.9e+08, 1.029e+09, 967750000, 358500000, 346500000, 486750000,
     442750000, 6.51e+08, 585125000, 98500000, 1.07e+08, 8.7e+07,
     1.1e+08, 752250000, 718250000, 1.584e+09, 1.712e+09, 2.394e+09,
     2.436e+09, 361125000, 799875000, 3672110, 703500000, 6.65e+08,
     3.6e+07, 4e+07, 1.75e+08, 181250000, 1.51e+08, 1.52e+08,
     375375000, 3.9e+08, 2077125000, 2055375000, 15500000, 15625000,
     52500000, 58500000, 3509750000, 3105750000, 7.6e+07, 8.1e+07,
     912625000, 9.8e+08, 470250000, 437250000, 11086250000, 4970875000,
     1.341e+09, 1461250000, 193500000, 1.95e+08, 508750000, 481250000,
     1057500000, 1162500000, 306625000, 301125000, 5417125000,
     6360875000, 48500000, 4.9e+07, 75375000, 71625000, 8250000,
     6.7e+07, 6.5e+07, 346250000, 377500000, 1.872e+10, 14515875000,
     193750000, 2e+08, 4.94e+08, 4.7e+08, 3375000, 1.41e+08, 1.36e+08,
     315875000, 337250000, 37875000, 39750000, 1.82e+08, 138250000,
     228750000, 279375000, 108750000, 1.08e+08, 98250000, 93750000,
     240500000, 2.47e+08, 772500000, 862500000, 356125000, 367125000,
     163750000, 163750000, 1.3e+07, 22500000, 2122875000, 2.223e+09,
     32375000, 31500000, 3.811e+09, 4969750000, 31875000, 29375000,
     42500000, 46500000, 1629750000, 2205125000, 5.9e+07, 0, 0,
     27625000, 30875000, 141750000, 38125000, 3.4e+08, 106750000,
     1.05e+08, 23031250, 26125000, 3.12e+08, 2.9e+08, 2.31e+08,
     184500000, 3.6e+07, 35750000, 65625000, 78750000, 212500000,
     186250000, 3e+07, 26250000, 11562500, 12500000), IPO.Flag = c(0,
     0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
     1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
     0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0), IPO.Issue.Date = c(NA,
     NA, NA, NA, NA, NA, 19860724, 19860724, NA, NA, 19870127,
     19870127, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19870811, 19870811,
     19870930, 19870930, NA, 19871124, 19871124, 19880225, 19880225,
     19880602, 19880602, NA, NA, 19880802, 19880802, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19710324, 19710324,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, 19710617, 19710617, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     19831014, 19831014, 19861016, 19861016, NA, NA, 19860502,
     19860502, NA, NA, 19890419, NA, NA, NA, NA, 19900412, 19900514,
     19900518, NA, NA, NA, NA, NA, NA, 19830603, 19830603, NA,
     NA, NA, NA, 19851206, 19851206, 19851211, 19851211, NA, NA
     ), Quarters.Since.IPO.Issue = c(NA, NA, NA, NA, NA, NA, 15L,
     16L, NA, NA, 13L, 14L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     12L, 11L, 11L, 10L, NA, 11L, 10L, 10L, 9L, 8L, 9L, NA, NA,
     8L, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, 77L, 78L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 77L, 76L,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, 27L, 26L, 14L, 15L, NA, NA, 16L, 17L, NA, NA, 5L, NA,
     NA, NA, NA, 1L, 1L, 1L, NA, NA, NA, NA, NA, NA, 29L, 28L,
     NA, NA, NA, NA, 18L, 19L, 18L, 19L, NA, NA)), .Names = c("PERMNO",
"DATE", "Shares.Owned", "Shares.Outstanding.13f", "Percent.Inst.Owned",
"Latest.Issue.Date.ByPERMNO", "Quarters.Since.19800101", 
"Quarters.Since.Latest.Issue",
"ALTPRC", "HSICIG", "Market.Cap.13f", "IPO.Flag", "IPO.Issue.Date",
"Quarters.Since.IPO.Issue"), row.names = c(79L, 85L, 9902L, 9908L,
15739L, 15758L, 16673L, 16675L, 20159L, 20160L, 32879L, 32889L,
38023L, 39404L, 39409L, 40405L, 40420L, 43114L, 43116L, 47939L,
47953L, 48091L, 48120L, 52828L, 52837L, 54612L, 55002L, 55048L,
56506L, 56508L, 59230L, 59247L, 60454L, 60461L, 60845L, 60852L,
66143L, 66147L, 69439L, 69454L, 72218L, 72232L, 81826L, 81840L,
87882L, 87883L, 105814L, 105832L, 106687L, 106709L, 106867L,
106875L, 110008L, 110081L, 113124L, 113125L, 113448L, 113460L,
114419L, 114431L, 116222L, 116234L, 117215L, 117310L, 119463L,
119477L, 119913L, 119927L, 120787L, 120799L, 121214L, 121215L,
121541L, 121548L, 121670L, 121680L, 122420L, 122421L, 123629L,
123679L, 124479L, 124485L, 125607L, 125608L, 126683L, 126716L,
126911L, 126954L, 126986L, 128941L, 128979L, 132991L, 133048L,
133090L, 133091L, 134227L, 134228L, 137449L, 137465L, 146656L,
146710L, 151717L, 151728L, 162724L, 162738L, 186344L, 186346L,
194239L, 194251L, 195124L, 195125L, 203411L, 203426L, 203486L,
203487L, 206821L, 206863L, 218733L, 218734L, 219083L, 219084L,
232389L, 232401L, 241221L, 241222L, 262518L, 262556L, 263151L,
263154L, 264783L, 264811L, 275743L, 278957L, 278958L, 281230L,
281242L, 281957L, 281962L, 286492L, 286504L, 294444L, 294445L,
297641L, 298974L, 298988L, 304628L, 304669L, 306326L, 306339L,
315987L, 316013L, 316939L, 316940L, 327003L, 327032L, 327976L,
327977L, 328372L, 328386L, 328621L, 328622L, 329277L, 329289L,
329983L, 329984L, 331735L, 331746L, 350849L, 350887L, 357747L,
357750L, 366913L, 366917L, 380680L, 380749L, 385635L, 385642L,
394280L, 394281L, 410203L, 417419L, 417420L, 418842L, 418851L,
423401L, 423687L, 423795L, 494497L, 494498L, 496519L, 496520L,
576735L, 576737L, 590042L, 590057L, 606077L, 606087L, 620736L,
620737L, 704834L, 704837L, 749540L, 749573L, 754161L, 754162L
), class = "data.frame")

======== end dataset dput representation =========

=========== begin function code ==========
fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
quarters_since_issue=20) {
     # tfdata must not have NAs for market.cap
     result = matrix(nrow=0, ncol=(ncol(tfdata) + 1)) # rbind for matrix 
is cheaper, so typecast the result to matrix

     colnames = names(tfdata) #grab the colnames, which we will shove 
back to the result at the end when we reconvert to data.frame

     quarterends = sort(unique(tfdata$DATE)) # the data are quarterly, 
all dates are quarter ends

     # basic code logic:
     # grab each quarter's data, in each quarter get the ipo subset, and 
the eligible matching firm subset
     # for each ipo from the ipo subset, select a matching firm from the 
eligible matching firm subset
     # the said selection is done based on industry group (HSICIG), and 
market cap (Market.Cap.13f)
     # Industry group has to be the same, market cap has to be 'closest 
one from above', or if that is not available, then 'closest one from below'.

     for (aquarter in quarterends) {
         tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
         tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) & 
(tfdata_quarter$IPO.Flag == 0), ]
         tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]

         for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
             arow = tfdata_quarter_ipoissuers[i,]
             industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
             industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]
             if ( nrow(industrypeers) > 0 ) {
                 if ( nrow(industrypeers[industrypeers$Market.Cap.13f >= 
arow$Market.Cap.13f, ]) > 0 ) {
                     bestpeer = 
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
                 }
                 else {
                     bestpeer = industrypeers[nrow(industrypeers),]
                 }
                 bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue
                 bestpeer$Peer.To.PERMNO = arow$PERMNO
                 result = rbind(result, as.matrix(bestpeer))
             }
         }
         print (aquarter)
     }

     result = as.data.frame(result)
     names(result) = c(colnames, 'Peer.To.PERMNO')
     return(result)

}
============== end function code===========


on 06/06/2008 01:35 PM Gabor Grothendieck said the following:

Thread (32 messages)

Daniel Folkinshteyn Improving data processing efficiency Jun 5 Bart Joosen Improving data processing efficiency Jun 5 Daniel Folkinshteyn Improving data processing efficiency Jun 5 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Patrick Burns Improving data processing efficiency Jun 6 Gabor Grothendieck Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Gabor Grothendieck Improving data processing efficiency Jun 6 Gabor Grothendieck Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Gabor Grothendieck Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Patrick Burns Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Greg Snow Improving data processing efficiency Jun 6 Gabor Grothendieck Improving data processing efficiency Jun 6 Greg Snow Improving data processing efficiency Jun 6 Patrick Burns Improving data processing efficiency Jun 6 Greg Snow New vocabulary on a Friday afternoon. Was: Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Don MacQueen Improving data processing efficiency Jun 6 Hadley Wickham Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Daniel Folkinshteyn Improving data processing efficiency Jun 6 Esmail Bonakdarian Improving data processing efficiency Jun 6 Horace Tso Improving data processing efficiency Jun 6 Esmail Bonakdarian Improving data processing efficiency Jun 6 Charles C. Berry Improving data processing efficiency Jun 6 Hadley Wickham Improving data processing efficiency Jun 6