Financial Econometrics and Empirical Market Microstructure
Longest Common Subsequence Similarity (LCSS)
The basic idea is to match two sequences by allowing some elements to be unmatched or left out. (Sankoff and Kruskal 1983). Given a sequence C(m), and a sequence Q(n), find a sequence Z, such that Z is the longest sequence that is both a subsequence of C, and a subsequence of Q, The subsequence is defined as a sequence Z(k)m where there exists a strictly increasing sequence i = 1,... k of indices of C such for all j = 1... k; Cij = Zj.
8 0, _if _i = 0 _or_j = 0
Cij = Ci_1 j_1 + 1, _if_i, j > 0, Qi = Cj (3)
max{Ci-1j, Cij-1} ,_if_i, j > 0, Qi ф Cj
Dissimilarity between C and Q
m C n — 2l
LCSS (C, Q) = - - - (4)
m C n
Where L is the length of the longest common subsequence.