Skip to content

Loop too slow for Bid calc - BUT cannot figure out how to do with matrix

12 messages · Duncan Murdoch, Martin Morgan, rivercode +4 more

#
Hi,

I am trying to create Bid/Ask for each second from a high volume stock and
the only way I have been able to solve this is using loops to create the
target matrix from the source tick data matrix.  Looping  is too slow and
not practical to use on multiple stocks. For example:

Bids Matrix (a real one is 400,000++ length):

Bid       Time
10.03    11:05:03.124
10.04    11:05:03.348
10.05    11:05:04.010

One Second Bid Matrix (Bid price for every second of the day):

Bid       Second
10.02   11:05:03
??        11:05:04    <----Last bid price before 11:05:04.xxx, which is
11.04 at 11:05:03.348 

The challenge is how to create the one second bid matrix, without looping
through the Bids Matrix to find the first timestamp that is greater than the
OneSecond  timestamp then getting the previous row price from
BidsMatrix...which would have been the bid at the beginning of that second.  

I am new to R, so need some help to do this ?properly?. 

#  OneSecond.  Matrix above called ?One Second Bid Matrix?
#  BidsMatrix.  Matrix above called ?Bids Matrix?

bidrow = 1

# looping through each second
for (sec in 1:length(OneSecond$Second) )  
    {
	t = as.POSIXlt(onesec$Second[sec],origin = "1970-01-01")
	sec.onesec = as.numeric(format(t, "%H%M%S")) # convert date/time to format
HHMMSS as a number
			
	  # Find bid for second, which is the last bid before a change in the
second
	  for (r in bidrow:length(BidsMatrix$Price))
		{
                 # convert the BidsMatrix timestamp to number of format
%H%M%S 
  		 bidTS = unlist(strsplit(as.character(BidsMatrix$Time[r]),
split="\\."))[1] # remove milliseconds
		 bidTS = gsub(":", "", bidTS) # remove ":" from time
		 bidTS = as.numeric(bidTS) # convert to number
	  
				
	    if (bidTS > sec.onesec)  
		  {
		    onesec$Bid[sec] = bids$Price[r -1] # Price of previous bid
    		    bidrow = r  # save bidrow as starting point to find next bid.
		    break
		  } #if
		
		}# for
	
}# for

Hope this is clear and  thanks for your help.

Chris
#
On 04/10/2010 5:29 PM, rivercode wrote:
I would use loops, but something like this:

lastrow <- Bids[1,]
time <- [ some starting time ]
for (i in 2:400000) {
   thisrow <- Bids[i,]
   while (thisrow[,2] > time) {
     output lastrow for time
     time <- time + 1
   }
   lastrow <- thisrow
}

I don't see how that would be too slow, but if it was, just rewrite it in C.

Duncan Murdoch
#
On 10/04/2010 02:29 PM, rivercode wrote:
Not sure that I understand, but here

 y = as.POSIXlt(runif(400000, 0, 8 * 60 * 60), origin="1970-01-01")

are 400,000 times over an 8 hour window, at sub-second intervals.  Order
these (order()), find the second in which each occurs
(floor(as.numeric())), identify the last record in each second (diff()
!= 0), including the last record of the day (c()), and keep only those (o[])

  o = order(y)
  i = o[ c(diff(floor(as.numeric(y)[o])) != 0, TRUE) ]

and view the time (two different ways)
[1] "1969-12-31 16:00:00 PST" "1969-12-31 16:00:01 PST"
[3] "1969-12-31 16:00:02 PST" "1969-12-31 16:00:03 PST"
[5] "1969-12-31 16:00:04 PST" "1969-12-31 16:00:05 PST"
[1] 0.9551883 1.8336520 2.8745100 3.9695889 4.8229001 5.8056079

if y were a column of df, y <- df$y and after the above df[i,]

Martin

  
    
2 days later
#
Duncan and Martin,

Thank you for your replies.

I went with Martin's suggestion as it did not require loops and is probably
the fastest...though it did take me 3 hours to figure out exactly how it was
working !!!

Here is what I am now using:

bids = cbind(bids, timeCalc)

orderBids = bids[order(bids$timeCalc),] # order bids on timeCalc col.

result = orderBids[c(diff(orderBids$timeCalc) != 0,TRUE),]  # get dataframe
with last bid for each second 

Thanks again for advice.

Chris
2 days later
#
Newbie question ... 

I am looking something equivalent to read.delim but  which accepts a text line as parameter instead of a file input.

Below is my problem, I'm unable to get the exact output which is a simple data frame of the data where the delimiter exists ... coming quite close though

I have a data frame with 10 lines called MF_Data
[1] "Scheme Code;Scheme Name;Net Asset Value;Repurchase Price;Sale Price;Date"                                        
 [2] ""                                                                                                                
 [3] "Open Ended Schemes ( Liquid )"                                                                                   
 [4] ""                                                                                                                
 [5] ""                                                                                                                
 [6] "AIG Global Investment Group Mutual Fund"                                                                         
 [7] "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010" 
 [8] "106511;AIG India Liquid Fund-Institutional Plan-Growth Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"         
 [9] "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
[10] "106503;AIG India Liquid Fund-Retail Plan-DailyDividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"         


Now for the lines below .. they are delimted by ; ... I am using 

 tempTxt <- MF_Data[7]
 MF_Data_F <-	unlist(strsplit(tempTxt,";", fixed = TRUE))
 tempTxt <- MF_Data[8]
 MF_Data_F1 <-	unlist(strsplit(tempTxt,";", fixed = TRUE))
 MF_Data_F <- rbind(MF_Data_F,MF_Data_F1)
 
But MF_Data_F is not a simple 2X6 data frame which is what I want
#
Is this what you are after:
+ , ""
+  ,"Open Ended Schemes ( Liquid )"
+ , ""
+ , ""
+ , "AIG Global Investment Group Mutual Fund"
+ , "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend
Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
+ , "106511;AIG India Liquid Fund-Institutional Plan-Growth
Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
+ , "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend
Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
+ , "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
Option;1001.0000;1001.0000;1001.0000;02-Oct-2010")
'data.frame':   4 obs. of  6 variables:
 $ V1: int  106506 106511 106507 106503
 $ V2: Factor w/ 4 levels "AIG India Liquid Fund-Institutional
Plan-Daily Dividend Option",..: 1 2 3 4
 $ V3: num  1001 1210 1002 1001
 $ V4: num  1001 1210 1002 1001
 $ V5: num  1001 1210 1002 1001
 $ V6: Factor w/ 1 level "02-Oct-2010": 1 1 1 1
V1
V2       V3       V4       V5          V6
1 106506  AIG India Liquid Fund-Institutional Plan-Daily Dividend
Option 1001.000 1001.000 1001.000 02-Oct-2010
2 106511          AIG India Liquid Fund-Institutional Plan-Growth
Option 1210.461 1210.461 1210.461 02-Oct-2010
3 106507 AIG India Liquid Fund-Institutional Plan-Weekly Dividend
Option 1001.876 1001.876 1001.876 02-Oct-2010
4 106503          AIG India Liquid Fund-Retail Plan-DailyDividend
Option 1001.000 1001.000 1001.000 02-Oct-2010
On Sat, Oct 9, 2010 at 12:18 PM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:

  
    
#
Jim's solution is the ideal way to read in the data: using the sep=";"
argument in read.table.

However, if you do for some reason have a vector of strings like the
following (maybe someone gives you an Rdata file instead of the raw
data file):

MF_Data <- c("106506;AIG India Liquid Fund-Institutional Plan-Daily
Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010","106511;AIG
India Liquid Fund-Institutional Plan-Growth
Option;1210.4612;1210.4612;1210.4612;02-Oct-2010")

Then you can use this to get a data frame:

as.data.frame(do.call(rbind, lapply(MF_Data, function(x)
unlist(strsplit(x, ';')))))

Cheers,

Jeff.
On Sat, Oct 9, 2010 at 12:30 PM, jim holtman <jholtman at gmail.com> wrote:
#
On Oct 9, 2010, at 12:46 PM, Jeffrey Spies wrote:

            
If you are suggesting that Jim's solution would not work here, then I  
would disagree and suggest you try offering your vector (without the  
<cr>'s inserted by our mail clients) to his code. It should work just  
fine and be far more readable.

On the other hand if you were offering this with an explanation that  
strsplit's split argument is more flexible than the sep argument in  
the read functions because it accepts regular expressions and so can  
handle situations where multiple separators exist in the same line,  
then I would applaud you.
#
Obviously Jim's solution does work, and I did not intend to imply it
didn't.  In fact, his read.table solution would work both if the OP
had a semi-colon delimited file to begin with (which I was trying to
say was ideal from a workflow standpoint) or a vector of strings (for
use when paired with textConnections).  Using strsplit is merely
another solution for the latter situation.  I thought the OP might
appreciate seeing how to use the function that they indicated they
were having problems with.  Plus, I have a penchant for R-ishly
"unreadble" code. ;)

Thanks for clarifying,

Jeff.
On Sat, Oct 9, 2010 at 1:04 PM, David Winsemius <dwinsemius at comcast.net> wrote:
#
Thanks Jim. Exactly what I needed!

-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com] 
Sent: 09 October 2010 22:01
To: Santosh Srinivas
Cc: r-help at r-project.org
Subject: Re: [R] StrSplit

Is this what you are after:
Price;Date"
+ , ""
+  ,"Open Ended Schemes ( Liquid )"
+ , ""
+ , ""
+ , "AIG Global Investment Group Mutual Fund"
+ , "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend
Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
+ , "106511;AIG India Liquid Fund-Institutional Plan-Growth
Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
+ , "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend
Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
+ , "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
Option;1001.0000;1001.0000;1001.0000;02-Oct-2010")
'data.frame':   4 obs. of  6 variables:
 $ V1: int  106506 106511 106507 106503
 $ V2: Factor w/ 4 levels "AIG India Liquid Fund-Institutional
Plan-Daily Dividend Option",..: 1 2 3 4
 $ V3: num  1001 1210 1002 1001
 $ V4: num  1001 1210 1002 1001
 $ V5: num  1001 1210 1002 1001
 $ V6: Factor w/ 1 level "02-Oct-2010": 1 1 1 1
V1
V2       V3       V4       V5          V6
1 106506  AIG India Liquid Fund-Institutional Plan-Daily Dividend
Option 1001.000 1001.000 1001.000 02-Oct-2010
2 106511          AIG India Liquid Fund-Institutional Plan-Growth
Option 1210.461 1210.461 1210.461 02-Oct-2010
3 106507 AIG India Liquid Fund-Institutional Plan-Weekly Dividend
Option 1001.876 1001.876 1001.876 02-Oct-2010
4 106503          AIG India Liquid Fund-Retail Plan-DailyDividend
Option 1001.000 1001.000 1001.000 02-Oct-2010
On Sat, Oct 9, 2010 at 12:18 PM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:
line as parameter instead of a file input.
data frame of the data where the delimiter exists ... coming quite close
though
Price;Date"
Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
http://www.R-project.org/posting-guide.html

  
    
#
Dear R-Group,

I am getting this error message "incomplete final line found by
readTableHeader" in the code below.

It seems to me that the error message is because of quote in the text data.
Is there any easy way to handle this? Or should I do a substitute.
Plan;18.92;18.92;19.35;02-Apr-2007
+ "
V1                             V2    V3    V4    V5          V6
1 100589 Canara Robeco Expo-Income Plan 18.92 18.92 19.35 02-Apr-2007
+ "
Error in read.table(textConnection(tempTxt), sep = ";") : 
  incomplete final line found by readTableHeader on 'tempTxt'

Thanks,
Santosh
#
The problem is that you have an unbalanced quote (') in your input .
you need to specifiy quote = '' in read.table:
+ "
V1                        V2    V3    V4    V5          V6
1 103272 Canara Robeco Fortune '94 30.07 30.07 30.75 02-Apr-2007

The quote is '94 in the string.

On Sat, Oct 9, 2010 at 10:05 PM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote: