Hello Folks,
I?m doing a study for a coursera Data Science class, and I am
trying to determine if American?s financial satisfaction has any
correlation to the percentage return of the S&P500 in the year prior. I
have to calculate the proportion of "Satisfied" and "More Or Less" compared
to the total number of observations for year "X", starting at 1989. After
computing that for each year, I need to place them in a column at the end
of the dataset similar to what we see with "PercentChange?. However, the
years only go from 1989 - 2012. Calculating it for each observation seems
tedious and inefficient. The end result is a chart where the X-Axis is each
different percent change, and the Y-Axis is the proportion that are
satisfied. What's the most efficient way to do this? Sorry for posting all
of my code, but I don?t know what?s important and what isn?t. I realize I
probably didn?t code everything in the most efficient way possible.
require(Quandl)
require(lubridate)
require(zoo)
require(xts)
myGSS <- load(url("http://bit.ly/dasi_gss_data"))
year <- gss$year
finSat <- gss$satfin
relativeTable <- data.frame(year, finSat)
relativeTable <- subset(relativeTable, year > "1988" & !is.na(finSat))
spReturns <- Quandl("SANDP/ANNRETS", trim_start="1970-01-11",
trim_end="2012-12-31", authcode="nwy3a_Gmd7TSS9fVirxT",
collapse="annual")
percentChange <- spReturns$"Total Return Change"
spReturns$"Year Ending" <- format((spReturns$"Year Ending"), "%Y")
spReturns$"Year Ending" <- as.numeric(spReturns$"Year Ending")
spReturns$"Year Ending" <- spReturns[,1] + 1 #the following year
combined <- merge(relativeTable, spReturns, by.x = "year", by.y = "Year
Ending")
names(combined)[6] <- "percentChange"
finalResults <- data.frame(combined$year, combined$finSat,
combined$percentChange)
names(finalResults)[1] <- "Year"
names(finalResults)[2] <- "FinancialSatisfaction"
names(finalResults)[3] <- "PercentChange"
finalResults$PercentChange <- finalResults$PercentChange * 100
Regards,
Jason E.
[[alternative HTML version deleted]]