Update: it appears that the time taken isn't so much on the Data conversion.
The maximum time taken is in CAPM calculation. :( Anyone know why the CAPM
calculation would be faster on Windows?
On Wed, May 19, 2010 at 5:51 PM, Abhijit Bera <abhibera at gmail.com> wrote:
Hi
This is my function. It serves an HTML page after the calculations. I'm
connecting to a MSSQL DB using pyodbc.
? ? def CAPM(self,client):
? ? ? ? r=self.r
? ? ? ? cds="1590"
? ? ? ? bm="20559"
? ? ? ? d1 = []
? ? ? ? v1 = []
? ? ? ? v2 = []
? ? ? ? print"Parsing GET Params"
? ? ? ? params=client.g[1].split("&")
? ? ? ? for items in params:
? ? ? ? ? ? item=items.split("=")
? ? ? ? ? ? if(item[0]=="cds"):
? ? ? ? ? ? ? ? cds=unquote(item[1])
? ? ? ? ? ? elif(item[0]=="bm"):
? ? ? ? ? ? ? ? bm=unquote(item[1])
? ? ? ? print "cds: %s bm: %s" % (cds,bm)
? ? ? ? print "Fetching data"
? ? ? ? t3=datetime.now()
? ? ? ? for row in self.cursor.execute("select * from (select * from (
select co_code,dlyprice_date,dlyprice_close from feed_dlyprice P where
co_code in (%s,%s) ) DataTable PIVOT ( max(dlyprice_close) FOR co_code IN
([%s],[%s]) ?)PivotTable ) a order by dlyprice_date" %(cds,bm,cds,bm)):
? ? ? ? ? ? d1.append(str(row[0]))
? ? ? ? ? ? v1.append(row[1])
? ? ? ? ? ? v2.append(row[2])
? ? ? ? t4=datetime.now()
? ? ? ? t1=datetime.now()
? ? ? ? print "Calculating"
? ? ? ? d1.pop(0)
? ? ? ? d1vec = robjects.StrVector(d1)
? ? ? ? v1vec = robjects.FloatVector(v1)
? ? ? ? v2vec = robjects.FloatVector(v2)
? ? ? ? r1 = r('Return.calculate(%s)' %v1vec.r_repr())
? ? ? ? r2 = r('Return.calculate(%s)' %v2vec.r_repr())
? ? ? ? tl = robjects.rlc.TaggedList([r1,r2],tags=('Geo','Nifty'))
? ? ? ? df = robjects.DataFrame(tl)
? ? ? ? ts2 = r.timeSeries(df,d1vec)
? ? ? ? tsa = r.timeSeries(r1,d1vec)
? ? ? ? tsb = r.timeSeries(r2,d1vec)
? ? ? ? robjects.globalenv["ta"] = tsa
? ? ? ? robjects.globalenv["tb"] = tsb
? ? ? ? robjects.globalenv["t2"] = ts2
? ? ? ? a = r('table.CAPM(ta,tb)')
? ? ? ? t2=datetime.now()
? ? ? ? page="<html><title>CAPM</title><body>Result:<br>%s<br>Time taken by
DB:%s<br>Time taken by R:%s<br>Total time elapsed:%s<br></body></html>"
%(str(a),str(t4-t3),str(t2-t1),str(t2-t3))
? ? ? ? print "Serving page:"
? ? ? ? #print page
? ? ? ? self.serveResource(page,"text",client)
On Linux
Time taken by DB:0:00:00.024165
Time taken by R:0:00:05.572084
Total time elapsed:0:00:05.596288
On Windows
Time taken by DB:0:00:00.112000
Time taken by R:0:00:02.355000
Total time elapsed:0:00:02.467000
Why is there such a huge difference in the time taken by R on the two
platforms? Am I doing something wrong? It's my first Rpy2 code so I guess
it's badly written.
I'm loading the following libraries:
'PerformanceAnalytics','timeSeries','fPortfolio','fPortfolioBacktest'
I'm using Rpy2 2.1.0 and R 2.11
Regards
Abhijit Bera
? ? ? ?[[alternative HTML version deleted]]