EFFICIENT ALGORITHMS FOR (δ,γ,α) AND (δ, kΔ, α)-MATCHING
We propose new algorithms for (δ,γ,α)-matching. In this string matching problem we are given a pattern P = p0p1 … pm−1 and a text T = t0t1 … tn−1 over some integer alphabet Σ = {0…σ − 1}. The pattern symbol pi δ-matches the text symbol tj iff |pi − tj| ≤ δ. The pattern P (δ,γ)-matches some text substring tj … tj+m−1 iff for all i it holds that |pi − tj+i| ≤ δ and Σ |pi − tj+i| ≤ γ. Finally, in (δ,γ,α)-matching we also permit at most α-symbol gaps between each matching text symbol. The only known previous algorithm runs in O(nm) time. We give several algorithms that improve the average case up to O(n) for small α, and the worst case to [Formula: see text] or O(nm log (γ)/w), where [Formula: see text] and w is the number of bits in a machine word. The proposed algorithms can be easily modified to solve several other related problems, we explicitly consider e.g. character classes (instead of δ-matching), (Δ-limited) k-mismatches (instead of γ-matching) and more general gaps, including negative ones. These find important applications in computational biology. We conclude with experimental results showing that the algorithms are very efficient in practice.