Implementing Sharpe's style analysis with solve.QP

Mon, Feb 25, 2008 4:34 PM

I've recently become interested in implementing style analysis as described by 
William Sharpe in:
  http://www.stanford.edu/~wfsharpe/art/sa/sa.htm

In particular, I want to implement the 'constrained' method he discusses using 
the quadratic programming approach he advocates.  I thought that this might 
be easy to do in R using the function 'solve.QP' in the package 'quadprog'.  
Easy, of course, is a relative term.  I've found the documentation somewhat 
oblique, and Googling hasn't produced much in the way of help.  

So I thought I would offer what I've been able to figure out and try to get 
some feedback.  What I have here *seems* to work, although I haven't had to 
specify this kind of a problem myself in, well, decades.

According to Sharpe (1994) what we're trying to do is minimize the variance of 
the difference between the observed returns and the modeled returns:
min VAR(R.f - SUM[w_i * R.s_i]) = min VAR(F - w*S)
  s.t. SUM(w_i) = 1; w_i > 0

where:
R.f   Fund returns
R.s   Style returns
w_i   Style weights

Remembering that VAR(aX + bY) = a^2 VAR(X) + b^2 VAR(Y) + 2ab COV(X,Y), 
we can refactor the problem as:

= VAR(R.f) + w'*V*w - 2*w'*COV(R.f,R.s)

where:
V     Variance-covariance matrix of style index matrix
C     Vector of covariances between the style index and the fund

At this point, we can drop VAR[R.f] as it isn't a function of weights and 
multiply both sides by 1/2 to get:

min (1/2) w'*V*w - C'w
  s.t. w'*e = 1, w_i > 0

where:
e     Vector of 1's

Now, map that to the inputs of solve.QP, which is specified as:
min(-d' b + 1/2 b' D b) with the constraints A' b >= b_0

so:
b is the weight vector,
D is the variance-covariance matrix of the styles
d is the covariance vector between the fund and the styles

Here is the function.  The comments embedded within the function discuss how 
each of the components is constructed for solve.QP.

`style.QPfit` <-
function(R.fund, R.style, model=FALSE, ...) 
{
# INPUTS
# R.fund   Vector of a fund return time series
# R.style   Matrix of a style return time series
# 
# 
# OUTPUTS
# weights   Vector of optimal style index weights
# @ todo: TE  Tracking error between the calc'd and actual fund
# @ todo: Fp  Vector of calculated fund time series
# R^2  Coefficient of determination
#
    # Check to see if the required libraries are loaded
    if(!require("quadprog", quietly=TRUE))
        stop("package", sQuote("quadprog"), "is needed.  Stopping")

    R.fund = checkData(R.fund[,1,drop=FALSE], method="matrix")
    R.style = checkData(R.style, method="matrix")

    # @todo: Missing data is not allowed, use = "pairwise.complete.obs" ?
    style.rows = dim(R.style)[1]
    style.cols = dim(R.style)[2]

    # Calculate D and d
    Dmat = cov(R.style)
    dvec = cov(R.fund, R.style)

    # To specify A' b >= b_0, we create an nxn matrix Amat and the constraints 
    # vector b0.  First we tackle Amat.  The first constraint listed above is 
    # SUM[w_i] = 1.  The left side of the constraint is expressed as a vector 
    # of 1's:
    a1 = rep(1, style.cols)

    # In b0, which represents the right side of the equation, this vector is 
    # paired with the value '1'.

    # The second constraint sets the lower bound of the weights to zero. 
    # First, we set up an identity matrix.  
    a2 = matrix(0, style.cols, style.cols)
    diag(a2) = 1

    # It's paired in b0 with a vector of lower bounds set to zeros:
    w.min = rep(0, style.cols)

    # Construct A from the two components a1 and a2
    Amat = t(rbind(a1, a2))

    # Construct b_0
    b0 = c(1, w.min)


    # This is where 'meq' comes in.  The ?solve.QP page says:
    #     meq: the first 'meq' constraints are treated as equality
    #     constraints, all further as inequality constraints (defaults
    #     to 0).
    # I think that the way to interpret this is: if it is set to a value 
    #'q' <= n, the ordered constraints numbered less than or equal to 'q' are
    # treated as an equality constraint.  In this case, we only want to first
    # constraint to be treated as an equality, so that the weights would sum
    # to exactly 1.  So we set meq = 1.

    # With 'meq' set to 1, the second constraint (a2) is treated as an
    # inequality, so each weight is constrainted to be greater than or equal
    # to zero.
    optimal <- solve.QP(Dmat, dvec, Amat, bvec=b0, meq=1)

    weights = as.data.frame(optimal$solution)
    rownames(weights) = colnames(R.style)
    colnames(weights) = colnames(R.fund)[1]

    # Calculate metrics for the quality of the fit
    R.fitted = rowSums(R.style*matrix(rep(t(weights),style.rows), 
byrow=T,ncol=style.cols))
    R.nonfactor = R.fund - R.fitted

    R.squared = as.data.frame(1 - (var(R.nonfactor)/var(R.fund)))
    adj.R.squared = as.data.frame(1 - (1 - R.squared)*(style.rows - 
1)/(style.rows - style.cols - 1))

    rownames(R.squared) = "R-squared"
    rownames(adj.R.squared) = "Adj R-squared"

    if(model) 
        # returns the entire output of the model
        result = optimal
    else 
        result = list(weights = weights, R.squared = R.squared, adj.R.squared 
= adj.R.squared )

    # @todo: retain the labels on the weights
    # @todo: add other values to output, e.g., 
    #    result <- list(weights = optimal$solution, R.squared = , 
tracking.error = )

    return(result)
}

Here's a contrived example.  Using 10 years of monthly index data, I use the 
Russell 1000 Growth and Value indexes to "explain" the returns of the S&P.  
My prior is that it should roughly split the S&P index in half along the 
value and growth lines given how these indexes are constructed, and with a 
very high R^2.

Here's the data:

SP500.TR 
Jan 1996   0.0340 
Feb 1996   0.0093 
Mar 1996   0.0096 
Apr 1996   0.0147 
May 1996   0.0258 
Jun 1996   0.0038

Russell.1000.Growth Russell.1000.Value
Jan 1996              0.0335             0.0312
Feb 1996              0.0183             0.0076
Mar 1996              0.0013             0.0170
Apr 1996              0.0263             0.0038
May 1996              0.0349             0.0125
Jun 1996              0.0014             0.0008

And here's the execution:

$weights
                     SP500.TR
Russell.1000.Growth 0.5047724
Russell.1000.Value  0.4952276

$R.squared
           SP500.TR
R-squared 0.9880977

$adj.R.squared
              SP500.TR
Adj R-squared  0.98793

So, this appears to work (at least for this simple example), and produces a 
high R^2 like we would expect.  To see the whole output of the solve.QP 
model, I can specify 'model' = TRUE:

$solution
[1] 0.5047724 0.4952276

$value
[1] -0.0008888153

$unconstrainted.solution
[1] 0.5040111 0.4815228

$iterations
[1] 2 0

$iact
[1] 1


Make sense?

pcc

Peter Carl
145 Scottswood Rd
Riverside, IL 60546
312 307 6346
http://www.braverock.com/~peter

Implementing Sharpe's style analysis with solve.QP

Thread (2 messages)