Conditional Weighted Average (ddply or any other function)

Is there any chance you meant
 ddply (dat1 , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKT))
??

It gives a result but I have no idea if it makes sense.

John Kane
Kingston ON Canada
-----Original Message-----
From: anandpunit at gmail.com
Sent: Fri, 1 Mar 2013 14:13:37 -0500
To: r-help at r-project.org
Subject: Re: [R] Conditional Weighted Average (ddply or any other
function)

Hi John,

The sample size is huge involving 10,000 + firms. I have put a
representative sample using dput ( Name, ticker and country have been
changed so that firms cannot be identified due to proprietary data
set, also EPS is not required and removed from the dataset)

structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC", "CTAX",
"INN", "NOB", "SH", "SZ", "WASH", "WILLSON"), class = "factor"),
    Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
    8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
    6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
    1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
    "CT56", "INN12", "NB12", "SH12", "SZ12", "W12", "W15"), class =
"factor"),
    Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
    5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial &
Professional Serv",
    "Energy", "Media", "Retail", "Transportation"), class = "factor"),
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
Discretionary",
    "Energy", "Industrials"), class = "factor"), Country =
structure(c(4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L), .Label = c("Brazil", "China", "India", "UK"), class =
"factor"),
    FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
    1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
    5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
    2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label = c("FY-1",
    "FY-2", "FY-3", "FY0", "FY1", "FY2"), class = "factor"),
    ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
    0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
    0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
    0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
    0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
    0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
    0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
    3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
    4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
    700357605, 1814942993, 1858225342, 1242890503, 1242890503,
    1879700000, 557093400, 224900300, 1634700000, 1443200000,
    3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
    7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
    6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
    4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
    513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
    )), .Names = c("NAME", "Ticker", "Industry", "Sector", "Country",
"FISCALYEAR", "ROE", "MKT"), class = "data.frame", row.names = c(NA,
-49L))

Thanks,
Punit

On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at inbox.com> wrote:
See below

-----Original Message-----
From: anandpunit at gmail.com
Sent: Fri, 1 Mar 2013 12:36:53 -0500
To: jrkrideau at inbox.com
Subject: Re: [R] Conditional Weighted Average (ddply or any other
function)

Hi John,

I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
Ticker, Sector, Country, FISCALYEAR or Year are character strings.

and column "Year" is referring to "FISCALYEAR"

 Definitely a no-no in R-help.  :)  We really need  some representative
sample data to play with.  See
https://github.com/hadley/devtools/wiki/Reproducibility for some general
pointers on how to compose a good question.  The fact that you included
the code you are using was excellent but without some data it is rather
useless.

 The easiest way to supply data  is to use the dput() function.  Example
with your file named "testfile":
dput(testfile)
Then copy the output and paste into your email.  This is what I did with
your data that I pasted into my email .  I added the dat1  <-  to it.

For large data sets, you can just supply a representative sample.
Usually,  dput(head(testfile, 100)) will be sufficient.

I hope this is of some help.

On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at inbox.com> wrote:
It is not at all clear what you are doing.  You state that the data
set
you are using is what I have called dat1 : see dput form below.

As far as I can see there is no numerical value in there.

##===========data set in dput form================#
dat1  <-  structure(list(Name = c("N1", "N1", "N1", "N1", "N1", "N1",
"N1",
         "N2", "N2", "N2", "N2", "N2", "N2", "N2"), Ticker = c("T1",
"T1",
         "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", "T2", "T2",
"T2",
         "T2"), Sector = c("S1", "S1", "S1", "S1", "S1", "S1", "S1",
"S2",
          "S2", "S2", "S2", "S2", "S2", "S2"), Industry = c("I1",
"I1",
"I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2",
          "I2"), Country = c("C1", "C1", "C1", "C1", "C1", "C1", "C1",
           "C2", "C2", "C2", "C2", "C2", "C2", "C2"), Year = c("FY-4",
"FY-3",
          "FY-2", "FY-1", "FY0", "FY1", "FY2", "FY-4", "FY-3", "FY-2",
          "FY-2", "FY0", "FY2", "FY2"), ROE = c("ROE11", "ROE12",
"ROE13",
          "ROE14", "ROE15", "ROE16", "ROE17", "ROE21", "ROE22",
"ROE23",
          "ROE24", "ROE25", "ROE26", "ROE27"), EPS = c("EPS11",
"EPS12",
          "EPS13", "EPS14", "EPS15", "EPS16", "EPS17", "EPS21",
"EPS22",
          "EPS23", "EPS24", "EPS25", "EPS26", "EPS27"), MKTCAP =
c("MKT11",
          "MKT12", "MKT13", "MKT14", "MKT15", "MKT16", "MKT17",
"MKT21",
          "MKT22", "MKT23", "MKT24", "MKT25", "MKT26", "MKT27")),
.Names
= c("Name",
         "Ticker", "Sector", "Industry", "Country", "Year", "ROE",
"EPS",
         "MKTCAP"), class = "data.frame", row.names = c(NA, -14L))
## =================end of dataset==================#

There is no FISCALYEAR variable that you specifed below

ddply (dataread , .(Sector, FISCALYEAR), summarise, > WROE=wavg(ROE,
MKTCAP)))
I think we need a bit more information.

John Kane
Kingston ON Canada

-----Original Message-----
From: anandpunit at gmail.com
Sent: Fri, 1 Mar 2013 11:01:42 -0500
To: r-help at r-project.org
Subject: [R] Conditional Weighted Average (ddply or any other
function)

Hello R community,

I am computing weighted average statistic by using ddply function:

My data set is:
N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27

with colnames:
(Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)

I want to compute
1) Weighted ROE based on Sector and Fiscal Year.
For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)

2) Weighted ROE based on Country and Fiscal Year.
For firm N1 of Country C1 and Fiscalyear FY-3 weight is
MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)

3) Weighted ROE based on Country, Sector and  Fiscal Year.
For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1, Fiscalyear
FY-3)

4) Weighted ROE based on Country, Industry and  Fiscal Year.
For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1,
Fiscalyear
FY-3)

I tried using ddply function:
ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE,
MKTCAP)))

where wavg <- function(x, wt) x %*% wt/sum(wt)
but this doesn't give me the right answer.

I could try subseting the data into different sectors and compute the
weighted average which doesn't look like an elegant solution and
would
defeat the purpose of ddply

I coudn't think of properly using melt and cast functions to solve
this issue. Any help will be highly appreciated.

Thanks and Regards,
Punit

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails

Conditional Weighted Average (ddply or any other function)

Thread (5 messages)