Pattern Recognition / Classification in R for Financial Time Series

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20081102/5ab986d0/attachment.pl>
Yes Ozkan is right.

as I answered privately to Ian, our proposal is not really the best  
choice for Ian's original problem (I've read his email too quickly,  
apologizes).

stefano
p.s. in any even I've updated sde package to include the MOdist (sde  
version 2.0.3)

This is a typical search problem that needs well defined similarity  
measure
(as Stefano quietly pointed out). For the 5 daily Open-High-Low- 
Close type
of series similarity measures based on some statistics/probability  
may not
work most of the time. There are several distance measures (L1,L2,
Minkowski, Cosine, edit distance, Statistical distances, etc) one  
can use to
obtain similar patterns. The similarity is context dependent and you  
should
first select the proper one.

As an example,
Assume that 3 derived values are obtained by means of simply  
dividing HLC to
Open.
HtoO, LtoO, CtoO (5 observations for each).
Then for each sequence of these 3 series, simple Euclidean distance  
can be
calculated with other stocks. If these 3 characteristics are assumed  
to be
equal, just average the distances obtained. If not, try to find out  
some
weights.
And finally the nearest neighbour(s) is (are) selected.

In this example, we simply treat time series as multivariate  
observations.
This means that we assume the sequence itself does not carry important
information, though exact sequences gives perfect similarity. But,  
increase
then decrease pattern has the same distance as decrease then increase
pattern to flat pattern although they have high dissimilarity. If  
these
patterns are clustered, they are certainly assigned to different  
clusters.

For the longer sequences I might consider using Longest Common  
Subsequence
type metric. For quite long series, other similarity measures, such  
as,
Mutual Information, ARIMA, Markov Operators as Stefano proposed,
coefficient, best 5-10 fft coefficients, or, some others like,
Kullback-Leibler, Kolmogorov-Smirnov, Histogram Intersection etc are  
found
to be useful to identify the similar processes.

As far as I know, most of the similarity measures are implemented in  
R (try,
machine learning, clustering, bioconductor-biodist) and they are  
ready to
use.

As for the last suggestion, try something simple first, then  
identify the
problem (if any) of this approach, then try another and.... (Occam's  
razor
is your guide when selecting the approach).

Good Luck!

Ozkan...

-----------
Hi I was wondering if there are any good packages in R that would be
useful in Time Series Pattern Recognition (3rd party software
suggestions are also welcome!) .

My search problem description is this: Given a specific 5 day OHLC
sequence in a particular stock A, I want to scan through a list of
stocks B, C, etc... and return another 5 day OHLC sequence which
closely 'matches' my given sequence.

The basic brute force algorithm which I'm working on currently is to
normalize all 5 day sequences in my search universe and to calculate
the differential in HL and return the top N patterns with the lowest
differential value. If there are any elegant / intelligent ways to
solve my problem, I would love to hear it! Thanks...

Rgds
Ian

	[[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.