An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20081102/5ab986d0/attachment.pl>
Pattern Recognition / Classification in R for Financial Time Series
2 messages · I. Ozkan, stefano iacus
Yes Ozkan is right. as I answered privately to Ian, our proposal is not really the best choice for Ian's original problem (I've read his email too quickly, apologizes). stefano p.s. in any even I've updated sde package to include the MOdist (sde version 2.0.3)
On 02/nov/08, at 23:08, I. Ozkan wrote:
This is a typical search problem that needs well defined similarity measure (as Stefano quietly pointed out). For the 5 daily Open-High-Low- Close type of series similarity measures based on some statistics/probability may not work most of the time. There are several distance measures (L1,L2, Minkowski, Cosine, edit distance, Statistical distances, etc) one can use to obtain similar patterns. The similarity is context dependent and you should first select the proper one. As an example, Assume that 3 derived values are obtained by means of simply dividing HLC to Open. HtoO, LtoO, CtoO (5 observations for each). Then for each sequence of these 3 series, simple Euclidean distance can be calculated with other stocks. If these 3 characteristics are assumed to be equal, just average the distances obtained. If not, try to find out some weights. And finally the nearest neighbour(s) is (are) selected. In this example, we simply treat time series as multivariate observations. This means that we assume the sequence itself does not carry important information, though exact sequences gives perfect similarity. But, increase then decrease pattern has the same distance as decrease then increase pattern to flat pattern although they have high dissimilarity. If these patterns are clustered, they are certainly assigned to different clusters. For the longer sequences I might consider using Longest Common Subsequence type metric. For quite long series, other similarity measures, such as, Mutual Information, ARIMA, Markov Operators as Stefano proposed, coefficient, best 5-10 fft coefficients, or, some others like, Kullback-Leibler, Kolmogorov-Smirnov, Histogram Intersection etc are found to be useful to identify the similar processes. As far as I know, most of the similarity measures are implemented in R (try, machine learning, clustering, bioconductor-biodist) and they are ready to use. As for the last suggestion, try something simple first, then identify the problem (if any) of this approach, then try another and.... (Occam's razor is your guide when selecting the approach). Good Luck! Ozkan... ----------- Hi I was wondering if there are any good packages in R that would be useful in Time Series Pattern Recognition (3rd party software suggestions are also welcome!) . My search problem description is this: Given a specific 5 day OHLC sequence in a particular stock A, I want to scan through a list of stocks B, C, etc... and return another 5 day OHLC sequence which closely 'matches' my given sequence. The basic brute force algorithm which I'm working on currently is to normalize all 5 day sequences in my search universe and to calculate the differential in HL and return the top N patterns with the lowest differential value. If there are any elegant / intelligent ways to solve my problem, I would love to hear it! Thanks... Rgds Ian [[alternative HTML version deleted]]
_______________________________________________ R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first.