Computing growth rate
This was ensured while using ddply()... On Thu, Dec 15, 2016 at 6:04 PM, Brijesh Mishra
<brijeshkmishra at gmail.com> wrote:
Dear Mr Hasselman, I missed you mail, while I was typing my own mail as a reply to Mr. Barradas suggestion. In fact, I implemented your suggestion even before reading it. But, I have a concern that I have noted (though its only hypothetical- such a scenario is very unlikely to occur). Is there a way to restrict such calculations co_code1 wise? Many thanks, Brijesh On Thu, Dec 15, 2016 at 5:48 PM, Berend Hasselman <bhh at xs4all.nl> wrote:
On 15 Dec 2016, at 04:40, Brijesh Mishra <brijeshkmishra at gmail.com> wrote:
Hi,
I am trying to calculate growth rate (say, sales, though it is to be
computed for many variables) in a panel data set. Problem is that I
have missing data for many firms for many years. To put it simply, I
have created this short dataframe (original df id much bigger)
df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7),
fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3))
# this gives me
co_code1 fyear1 sales1
1 1100 1990 1000
2 1100 1991 1100
3 1100 1992 1200
4 1100 1993 1300
5 1100 1994 1400
6 1100 1995 1500
7 1100 1996 1600
8 1200 1990 1000
9 1200 1991 1100
10 1200 1992 1200
11 1200 1993 1300
12 1200 1994 1400
13 1200 1995 1500
14 1200 1996 1600
15 1300 1990 1000
16 1300 1991 1100
17 1300 1992 1200
18 1300 1993 1300
19 1300 1994 1400
20 1300 1995 1500
21 1300 1996 1600
# I am now removing a couple of rows
df1<-df1[-c(5, 8), ]
# the result is
co_code1 fyear1 sales1
1 1100 1990 1000
2 1100 1991 1100
3 1100 1992 1200
4 1100 1993 1300
6 1100 1995 1500
7 1100 1996 1600
9 1200 1991 1100
10 1200 1992 1200
11 1200 1993 1300
12 1200 1994 1400
13 1200 1995 1500
14 1200 1996 1600
15 1300 1990 1000
16 1300 1991 1100
17 1300 1992 1200
18 1300 1993 1300
19 1300 1994 1400
20 1300 1995 1500
21 1300 1996 1600
# so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been
removed. If I try,
d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100)
# this apparently gives wrong results for the year 1995 (as shown
below) as growth rates are computed considering yearly increment.
co_code1 fyear1 sales1 growth
1 1100 1990 1000 NA
2 1100 1991 1100 10.000000
3 1100 1992 1200 9.090909
4 1100 1993 1300 8.333333
5 1100 1995 1500 15.384615
6 1100 1996 1600 6.666667
7 1200 1991 1100 NA
8 1200 1992 1200 9.090909
9 1200 1993 1300 8.333333
10 1200 1994 1400 7.692308
11 1200 1995 1500 7.142857
12 1200 1996 1600 6.666667
13 1300 1990 1000 NA
14 1300 1991 1100 10.000000
15 1300 1992 1200 9.090909
16 1300 1993 1300 8.333333
17 1300 1994 1400 7.692308
18 1300 1995 1500 7.142857
19 1300 1996 1600 6.666667
# I thought of using the formula only when the increment of fyear1 is
only 1 while in a co_code1, by using this formula
d<-ddply(df1,
"co_code1",
transform,
if(diff(fyear1)==1){
growth=(exp(diff(log(df1$sales1)))-1)*100
} else{
growth=NA
})
But, this doesn't work. I am getting the following error.
In if (diff(fyear1) == 1) { :
the condition has length > 1 and only the first element will be used
(repeated a few times).
# I have searched for a solution, but somehow couldn't get one. Hope
that some kind soul will guide me here.
In your case use ifelse() as explained by Rui.
But it can be done more easily since the fyear1 and co_code1 are synchronized.
Add a new column to df1 like this
df1$growth <- c(NA,
ifelse(diff(df1$fyear1)==1,
(exp(diff(log(df1$sales1)))-1)*100,
NA
)
)
and display df1. From your request I cannot determine if this is what you want.
regards,
Berend Hasselman