Conditional Weighted Average (ddply or any other function)
Thanks for the data . You have an undefined variable in the ddply statement. There is no MKTCAP in the data.frame. You also have one two many ) in the statement. I think it should read: ddply (dat1 , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKTCAP)) John Kane Kingston ON Canada
-----Original Message-----
From: anandpunit at gmail.com
Sent: Fri, 1 Mar 2013 14:13:37 -0500
To: r-help at r-project.org
Subject: Re: [R] Conditional Weighted Average (ddply or any other
function)
Hi John,
The sample size is huge involving 10,000 + firms. I have put a
representative sample using dput ( Name, ticker and country have been
changed so that firms cannot be identified due to proprietary data
set, also EPS is not required and removed from the dataset)
structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC", "CTAX",
"INN", "NOB", "SH", "SZ", "WASH", "WILLSON"), class = "factor"),
Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
"CT56", "INN12", "NB12", "SH12", "SZ12", "W12", "W15"), class =
"factor"),
Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial &
Professional Serv",
"Energy", "Media", "Retail", "Transportation"), class = "factor"),
Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
Discretionary",
"Energy", "Industrials"), class = "factor"), Country =
structure(c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("Brazil", "China", "India", "UK"), class =
"factor"),
FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label = c("FY-1",
"FY-2", "FY-3", "FY0", "FY1", "FY2"), class = "factor"),
ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
700357605, 1814942993, 1858225342, 1242890503, 1242890503,
1879700000, 557093400, 224900300, 1634700000, 1443200000,
3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
)), .Names = c("NAME", "Ticker", "Industry", "Sector", "Country",
"FISCALYEAR", "ROE", "MKT"), class = "data.frame", row.names = c(NA,
-49L))
Thanks,
Punit
On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at inbox.com> wrote: See below
-----Original Message----- From: anandpunit at gmail.com Sent: Fri, 1 Mar 2013 12:36:53 -0500 To: jrkrideau at inbox.com Subject: Re: [R] Conditional Weighted Average (ddply or any other function) Hi John, I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name, Ticker, Sector, Country, FISCALYEAR or Year are character strings. and column "Year" is referring to "FISCALYEAR"
Definitely a no-no in R-help. :) We really need some representative sample data to play with. See https://github.com/hadley/devtools/wiki/Reproducibility for some general pointers on how to compose a good question. The fact that you included the code you are using was excellent but without some data it is rather useless. The easiest way to supply data is to use the dput() function. Example with your file named "testfile": dput(testfile) Then copy the output and paste into your email. This is what I did with your data that I pasted into my email . I added the dat1 <- to it. For large data sets, you can just supply a representative sample. Usually, dput(head(testfile, 100)) will be sufficient. I hope this is of some help.
On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at inbox.com> wrote:
It is not at all clear what you are doing. You state that the data
set
you are using is what I have called dat1 : see dput form below.
As far as I can see there is no numerical value in there.
##===========data set in dput form================#
dat1 <- structure(list(Name = c("N1", "N1", "N1", "N1", "N1", "N1",
"N1",
"N2", "N2", "N2", "N2", "N2", "N2", "N2"), Ticker = c("T1",
"T1",
"T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", "T2", "T2",
"T2",
"T2"), Sector = c("S1", "S1", "S1", "S1", "S1", "S1", "S1",
"S2",
"S2", "S2", "S2", "S2", "S2", "S2"), Industry = c("I1",
"I1",
"I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2",
"I2"), Country = c("C1", "C1", "C1", "C1", "C1", "C1", "C1",
"C2", "C2", "C2", "C2", "C2", "C2", "C2"), Year = c("FY-4",
"FY-3",
"FY-2", "FY-1", "FY0", "FY1", "FY2", "FY-4", "FY-3", "FY-2",
"FY-2", "FY0", "FY2", "FY2"), ROE = c("ROE11", "ROE12",
"ROE13",
"ROE14", "ROE15", "ROE16", "ROE17", "ROE21", "ROE22",
"ROE23",
"ROE24", "ROE25", "ROE26", "ROE27"), EPS = c("EPS11",
"EPS12",
"EPS13", "EPS14", "EPS15", "EPS16", "EPS17", "EPS21",
"EPS22",
"EPS23", "EPS24", "EPS25", "EPS26", "EPS27"), MKTCAP =
c("MKT11",
"MKT12", "MKT13", "MKT14", "MKT15", "MKT16", "MKT17",
"MKT21",
"MKT22", "MKT23", "MKT24", "MKT25", "MKT26", "MKT27")),
.Names
= c("Name",
"Ticker", "Sector", "Industry", "Country", "Year", "ROE",
"EPS",
"MKTCAP"), class = "data.frame", row.names = c(NA, -14L))
## =================end of dataset==================#
There is no FISCALYEAR variable that you specifed below
ddply (dataread , .(Sector, FISCALYEAR), summarise, > WROE=wavg(ROE, MKTCAP)))
I think we need a bit more information. John Kane Kingston ON Canada
-----Original Message----- From: anandpunit at gmail.com Sent: Fri, 1 Mar 2013 11:01:42 -0500 To: r-help at r-project.org Subject: [R] Conditional Weighted Average (ddply or any other function) Hello R community, I am computing weighted average statistic by using ddply function: My data set is: N1 T1 S1 I1 C1 FY-4 ROE11 EPS11 MKT11 N1 T1 S1 I1 C1 FY-3 ROE12 EPS12 MKT12 N1 T1 S1 I1 C1 FY-2 ROE13 EPS13 MKT13 N1 T1 S1 I1 C1 FY-1 ROE14 EPS14 MKT14 N1 T1 S1 I1 C1 FY0 ROE15 EPS15 MKT15 N1 T1 S1 I1 C1 FY1 ROE16 EPS16 MKT16 N1 T1 S1 I1 C1 FY2 ROE17 EPS17 MKT17 N2 T2 S2 I2 C2 FY-4 ROE21 EPS21 MKT21 N2 T2 S2 I2 C2 FY-3 ROE22 EPS22 MKT22 N2 T2 S2 I2 C2 FY-2 ROE23 EPS23 MKT23 N2 T2 S2 I2 C2 FY-2 ROE24 EPS24 MKT24 N2 T2 S2 I2 C2 FY0 ROE25 EPS25 MKT25 N2 T2 S2 I2 C2 FY2 ROE26 EPS26 MKT26 N2 T2 S2 I2 C2 FY2 ROE27 EPS27 MKT27 with colnames: (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP) I want to compute 1) Weighted ROE based on Sector and Fiscal Year. For firm N1 of Sector S1 and Fiscalyear FY-3 weight is MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3) 2) Weighted ROE based on Country and Fiscal Year. For firm N1 of Country C1 and Fiscalyear FY-3 weight is MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3) 3) Weighted ROE based on Country, Sector and Fiscal Year. For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3 weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1, Fiscalyear FY-3) 4) Weighted ROE based on Country, Industry and Fiscal Year. For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3 weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1, Fiscalyear FY-3) I tried using ddply function: ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKTCAP))) where wavg <- function(x, wt) x %*% wt/sum(wt) but this doesn't give me the right answer. I could try subseting the data into different sectors and compute the weighted average which doesn't look like an elegant solution and would defeat the purpose of ddply I coudn't think of properly using melt and cast functions to solve this issue. Any help will be highly appreciated. Thanks and Regards, Punit
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails