Skip to content

Hasbroucks Information Share in R

3 messages · Drew Harris, Brian G. Peterson, R. Michael Weylandt

#
On 07/10/2012 10:25 PM, Drew Harris wrote:
Drew,

Thanks for taking the time to provide code and describe your problem, 
but what you've provided isn't reproducible.

You'll need to provide a small data set and expected results (presumably 
from the SAS code) before anyone will be able to examine your code and 
sort out why they differ.

You say 'about 10%', so I assume that this doesn't mean 'precisely 10%' 
which would likely be a simple multiplier change.

Have you looked at all the intermediary steps in the original SAS code 
and compared these to the R code?  obviously this won't be possible at 
every step. but I'd assume it should be at most.

etc etc etc.

much easier to help if people had data to compare to.

Regards,

    - Brian
#
Hi Drew,

In addition to Brian's comments, a few stylistic / performance notes
below. I can't check content, but hopefully you'll find the resulting
code a little clearer and more idiomatic.
On Tue, Jul 10, 2012 at 10:25 PM, Drew Harris <drew.harris.nz at gmail.com> wrote:
It's generally going to give you worlds better performance to make a
list of objects and then rbind() them all at once. (Consider the
equivalent C level operations of malloc()ing and free()ing on each
iteration, with all the work that entails) You can also avoid an
(explicit) for loop by doing something like:

do.call(rbind, lapply(FER, function(x) x[d[1],]))
As before, something like

do.call(rbind, lapply(CRF, function(x) x[d[1],]))
Are you sure you mean to sqrt() twice in here?
Probably a little easier to use the tcrossprod() function:

VarContrib <- tcrossprod(LRCoeffs, chol(cor))
See below, but this sort of syntax is sometimes fraught with peril
because it gives a loop of length 2, not 0 if VectorCount = 1;
specifically, it loops for i = 1 and then i = 0.
I'll let you modify this one like above, but note there's also a
rowMeans() function which will be faster than colMeans(t(...))
I'd recommend you wrap this all up in a function: then you don't have
to worry about freeing memory with rm() [It just happens when the
function call ends] and your code will be more reusable. If you do so,
it's not totally necessary, but it'll be easier to read if you add
return(InformationShares) at the end. [Some prefer just
InformationShares without a return statement but I find that
marginally less obvious: you actually wouldn't need either here
because of some subtleties about <- being a function as well, but
don't worry about that here.]

Disclaimer: I did the do.call(rbind, ...) bits real quick: they might
not be quite perfect.

Best,
Michael