Skip to content

Aggragating subsets of data in larger vector with sapply

3 messages · rivercode, jim holtman, Joshua Ulrich

#
Have 40,000 rows of buy/sell trade data and am trying to add up the buys for
each second, the code works but it is very slow.  Any suggestions how to
improve the sapply function ?

secEP = endpoints(xSym$Direction, "secs")  # vector of last second on an XTS
timeseries object with multiple entries for each second.
d = xSym$Direction
s = xSym$Size
buySize = sapply(1:(length(secEP)-1), function(y) { 
	i =  (secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP
	return(sum(as.numeric(s[i][d[i] == "buy"])));
} )	

Object details:

secEP = numeric Vector of one second Endpoints in xSym$Direction.
Direction
2011-01-05 09:30:00 "unkn"   
2011-01-05 09:30:02 "sell"   
2011-01-05 09:30:02 "buy"    
2011-01-05 09:30:04 "buy"    
2011-01-05 09:30:04 "buy"    
2011-01-05 09:30:04 "buy"
Size  
2011-01-05 09:30:00 " 865"
2011-01-05 09:30:02 " 100"
2011-01-05 09:30:02 " 100"
2011-01-05 09:30:04 " 100"
2011-01-05 09:30:04 " 100"
2011-01-05 09:30:04 "  41"

Thanks,
Chris
#
split the data by truncating the time to a second, then process each group. this will save the subsetting you are doing. also merge the data with direction and size in the same frame.  it looks like you can subset by "buy" to begin with.

Sent from my iPad
On Jan 9, 2011, at 19:10, rivercode <aquanyc at gmail.com> wrote:

            
2 days later
#
Hi Chris,

This seems to work on the sample data you provided.

FUN <- function(x) {
  x <- xts(as.numeric(x),index(x))
  period.apply(x, endpoints(x,"secs"), sum)
}
lapply(split.default(xSym$Size,xSym$Direction), FUN)

Best,
--
Joshua Ulrich ?| ?FOSS Trading: www.fosstrading.com
On Sun, Jan 9, 2011 at 6:10 PM, rivercode <aquanyc at gmail.com> wrote: