An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20120711/27502f33/attachment.pl>
Hasbroucks Information Share in R
3 messages · Drew Harris, Brian G. Peterson, R. Michael Weylandt
On 07/10/2012 10:25 PM, Drew Harris wrote:
I am a recent R converter so please bare with me if I ask any stupid questions. I am open to both constructive criticism and blatant mockery. I have been working on converting the Joel Hasbrouck Information share code originally for SAS and Implementing it into R. I have a working code but for some reason I am about 10% of the actual values, if there is anyone more experienced in this area I would much appreciate any advice availlable.
Drew,
Thanks for taking the time to provide code and describe your problem,
but what you've provided isn't reproducible.
You'll need to provide a small data set and expected results (presumably
from the SAS code) before anyone will be able to examine your code and
sort out why they differ.
You say 'about 10%', so I assume that this doesn't mean 'precisely 10%'
which would likely be a simple multiplier change.
Have you looked at all the intermediary steps in the original SAS code
and compared these to the R code? obviously this won't be possible at
every step. but I'd assume it should be at most.
etc etc etc.
much easier to help if people had data to compare to.
Regards,
- Brian
Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock
Hi Drew, In addition to Brian's comments, a few stylistic / performance notes below. I can't check content, but hopefully you'll find the resulting code a little clearer and more idiomatic.
On Tue, Jul 10, 2012 at 10:25 PM, Drew Harris <drew.harris.nz at gmail.com> wrote:
Hi all,
I am a recent R converter so please bare with me if I ask any stupid
questions. I am open to both constructive criticism and blatant mockery. I
have been working on converting the Joel Hasbrouck Information share code
originally for SAS and Implementing it into R. I have a working code but
for some reason I am about 10% of the actual values, if there is anyone
more experienced in this area I would much appreciate any advice availlable.
I know there is a better method in a previous post by Niddhi Aggarwaul with
a very coherent code for calculating a 2 dimensional case but I really need
an n dimensional case which should be represented by my code being ideally
the same as Hasbrouk's. My code is below, x refers to a matrix of parallel
midpoint stock quotes of n markets.
#find the optimal lag length
lags <- VARselect(x, lag.max=100)$selection[1]
#Johansens Cointegration test is applied and then estimates the VECM with
the estimated Beta
cointest <- ca.jo(x, K=lags, type="eigen", ecdet="const",
spec="transitory")
#summary(cointest)
vecm <- cajorls(cointest)
#Breaks the VECM down into VAR by levels as in Hasbrouck
var <- vec2var(cointest)
#covariance of forecast error response
FER <- fevd(var, n.ahead=ImpulseResponse,cumulative=TRUE, boot=FALSE)
#covariance matrix of innovations
d <- dim(FER[[1]])
for (i in 1:n) {
errvar <- FER[[i]]
if (i == 1) {CovIn <- errvar[(d[1]),]}
if (i != 1) {CovIn <- rbind(CovIn, errvar[(d[1]),])}
}
It's generally going to give you worlds better performance to make a list of objects and then rbind() them all at once. (Consider the equivalent C level operations of malloc()ing and free()ing on each iteration, with all the work that entails) You can also avoid an (explicit) for loop by doing something like: do.call(rbind, lapply(FER, function(x) x[d[1],]))
#long run cumulative response functions
CRF <- irf(var, n.ahead=ImpulseResponse, ortho=TRUE, cumulative=FALSE,
boot=FALSE)
#matrix of long run cumulative response functions
d <- dim(CRF$irf[[1]])
for (i in 1:n) {
lrcrf <- CRF$irf[[i]]
if (i == 1) {LRCoeffs <- lrcrf[(d[1]),]}
if (i != 1) {LRCoeffs <- rbind(LRCoeffs, lrcrf[(d[1]),])}
}
As before, something like do.call(rbind, lapply(CRF, function(x) x[d[1],]))
Cdiag <- diag(CovIn) sqrtd <- sqrt(Cdiag) sd <- diag(sqrt(sqrtd)
Are you sure you mean to sqrt() twice in here?
cor <- sd %*% CovIn %*% sd cd <- t(chol(cor)) VarContrib <- LRCoeffs %*% cd
Probably a little easier to use the tcrossprod() function: VarContrib <- tcrossprod(LRCoeffs, chol(cor))
VarTotal <- colSums(VarContrib)
VarTotalMatrix <- VarTotal
for (i in 1:(VectorCount-1)) {
See below, but this sort of syntax is sometimes fraught with peril because it gives a loop of length 2, not 0 if VectorCount = 1; specifically, it loops for i = 1 and then i = 0.
VarTotalMatrix <- rbind(VarTotalMatrix,VarTotal)}
PC <- VarContrib / (VarTotalMatrix)
if (j == 1) {
PropCont <- colMeans(t(PC))}
if (j != 1) {
PropCont <- rbind(PropCont, colMeans(t(PC)))}
}
I'll let you modify this one like above, but note there's also a rowMeans() function which will be faster than colMeans(t(...))
rm(InformationShares) colnames(PropCont) <- InputVectors InformationShares <-colMeans(PropCont)
I'd recommend you wrap this all up in a function: then you don't have to worry about freeing memory with rm() [It just happens when the function call ends] and your code will be more reusable. If you do so, it's not totally necessary, but it'll be easier to read if you add return(InformationShares) at the end. [Some prefer just InformationShares without a return statement but I find that marginally less obvious: you actually wouldn't need either here because of some subtleties about <- being a function as well, but don't worry about that here.] Disclaimer: I did the do.call(rbind, ...) bits real quick: they might not be quite perfect. Best, Michael
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.