I've recently become interested in implementing style analysis as described by
William Sharpe in:
http://www.stanford.edu/~wfsharpe/art/sa/sa.htm
In particular, I want to implement the 'constrained' method he discusses using
the quadratic programming approach he advocates. I thought that this might
be easy to do in R using the function 'solve.QP' in the package 'quadprog'.
Easy, of course, is a relative term. I've found the documentation somewhat
oblique, and Googling hasn't produced much in the way of help.
So I thought I would offer what I've been able to figure out and try to get
some feedback. What I have here *seems* to work, although I haven't had to
specify this kind of a problem myself in, well, decades.
According to Sharpe (1994) what we're trying to do is minimize the variance of
the difference between the observed returns and the modeled returns:
min VAR(R.f - SUM[w_i * R.s_i]) = min VAR(F - w*S)
s.t. SUM(w_i) = 1; w_i > 0
where:
R.f Fund returns
R.s Style returns
w_i Style weights
Remembering that VAR(aX + bY) = a^2 VAR(X) + b^2 VAR(Y) + 2ab COV(X,Y),
we can refactor the problem as:
= VAR(R.f) + w'*V*w - 2*w'*COV(R.f,R.s)
where:
V Variance-covariance matrix of style index matrix
C Vector of covariances between the style index and the fund
At this point, we can drop VAR[R.f] as it isn't a function of weights and
multiply both sides by 1/2 to get:
min (1/2) w'*V*w - C'w
s.t. w'*e = 1, w_i > 0
where:
e Vector of 1's
Now, map that to the inputs of solve.QP, which is specified as:
min(-d' b + 1/2 b' D b) with the constraints A' b >= b_0
so:
b is the weight vector,
D is the variance-covariance matrix of the styles
d is the covariance vector between the fund and the styles
Here is the function. The comments embedded within the function discuss how
each of the components is constructed for solve.QP.
`style.QPfit` <-
function(R.fund, R.style, model=FALSE, ...)
{
# INPUTS
# R.fund Vector of a fund return time series
# R.style Matrix of a style return time series
#
#
# OUTPUTS
# weights Vector of optimal style index weights
# @ todo: TE Tracking error between the calc'd and actual fund
# @ todo: Fp Vector of calculated fund time series
# R^2 Coefficient of determination
#
# Check to see if the required libraries are loaded
if(!require("quadprog", quietly=TRUE))
stop("package", sQuote("quadprog"), "is needed. Stopping")
R.fund = checkData(R.fund[,1,drop=FALSE], method="matrix")
R.style = checkData(R.style, method="matrix")
# @todo: Missing data is not allowed, use = "pairwise.complete.obs" ?
style.rows = dim(R.style)[1]
style.cols = dim(R.style)[2]
# Calculate D and d
Dmat = cov(R.style)
dvec = cov(R.fund, R.style)
# To specify A' b >= b_0, we create an nxn matrix Amat and the constraints
# vector b0. First we tackle Amat. The first constraint listed above is
# SUM[w_i] = 1. The left side of the constraint is expressed as a vector
# of 1's:
a1 = rep(1, style.cols)
# In b0, which represents the right side of the equation, this vector is
# paired with the value '1'.
# The second constraint sets the lower bound of the weights to zero.
# First, we set up an identity matrix.
a2 = matrix(0, style.cols, style.cols)
diag(a2) = 1
# It's paired in b0 with a vector of lower bounds set to zeros:
w.min = rep(0, style.cols)
# Construct A from the two components a1 and a2
Amat = t(rbind(a1, a2))
# Construct b_0
b0 = c(1, w.min)
# This is where 'meq' comes in. The ?solve.QP page says:
# meq: the first 'meq' constraints are treated as equality
# constraints, all further as inequality constraints (defaults
# to 0).
# I think that the way to interpret this is: if it is set to a value
#'q' <= n, the ordered constraints numbered less than or equal to 'q' are
# treated as an equality constraint. In this case, we only want to first
# constraint to be treated as an equality, so that the weights would sum
# to exactly 1. So we set meq = 1.
# With 'meq' set to 1, the second constraint (a2) is treated as an
# inequality, so each weight is constrainted to be greater than or equal
# to zero.
optimal <- solve.QP(Dmat, dvec, Amat, bvec=b0, meq=1)
weights = as.data.frame(optimal$solution)
rownames(weights) = colnames(R.style)
colnames(weights) = colnames(R.fund)[1]
# Calculate metrics for the quality of the fit
R.fitted = rowSums(R.style*matrix(rep(t(weights),style.rows),
byrow=T,ncol=style.cols))
R.nonfactor = R.fund - R.fitted
R.squared = as.data.frame(1 - (var(R.nonfactor)/var(R.fund)))
adj.R.squared = as.data.frame(1 - (1 - R.squared)*(style.rows -
1)/(style.rows - style.cols - 1))
rownames(R.squared) = "R-squared"
rownames(adj.R.squared) = "Adj R-squared"
if(model)
# returns the entire output of the model
result = optimal
else
result = list(weights = weights, R.squared = R.squared, adj.R.squared
= adj.R.squared )
# @todo: retain the labels on the weights
# @todo: add other values to output, e.g.,
# result <- list(weights = optimal$solution, R.squared = ,
tracking.error = )
return(result)
}
Here's a contrived example. Using 10 years of monthly index data, I use the
Russell 1000 Growth and Value indexes to "explain" the returns of the S&P.
My prior is that it should roughly split the S&P index in half along the
value and growth lines given how these indexes are constructed, and with a
very high R^2.
Here's the data:
head(R.fund)
SP500.TR
Jan 1996 0.0340
Feb 1996 0.0093
Mar 1996 0.0096
Apr 1996 0.0147
May 1996 0.0258
Jun 1996 0.0038
head(R.style)
Russell.1000.Growth Russell.1000.Value
Jan 1996 0.0335 0.0312
Feb 1996 0.0183 0.0076
Mar 1996 0.0013 0.0170
Apr 1996 0.0263 0.0038
May 1996 0.0349 0.0125
Jun 1996 0.0014 0.0008
And here's the execution:
style.QPfit(R.fund, R.style)
$weights
SP500.TR
Russell.1000.Growth 0.5047724
Russell.1000.Value 0.4952276
$R.squared
SP500.TR
R-squared 0.9880977
$adj.R.squared
SP500.TR
Adj R-squared 0.98793
So, this appears to work (at least for this simple example), and produces a
high R^2 like we would expect. To see the whole output of the solve.QP
model, I can specify 'model' = TRUE: