博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Pattern Searching
阅读量:4150 次
发布时间:2019-05-25

本文共 4192 字,大约阅读时间需要 13 分钟。

Problem Definition:

Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.

Examples:

1) Input:

txt[] =  "THIS IS A TEST TEXT"  pat[] = "TEST"

Output:

Pattern found at index 10

2) Input:

txt[] =  "AABAACAADAABAAABAA"  pat[] = "AABA"

Output:

Pattern found at index 0   Pattern found at index 9   Pattern found at index 13

Solutions:

Naive Pattern Searching 

A brute-force solution
Time Complexity:
 
O(n*m)
Improvement Insights:
When we compare pat[j] with txt[i] and see a mismatch, Naive Pattern Searching just backtrack to pat[0] and txt[i-j] naively. Wcan do some improvement on this from different aspect:
1. when we are searching, can we don't backtrack the sub-index i of txt[]? The answer is yes: KMP,  A Naive Pattern Searching For Special Pattern and Finite Automata.
KMP: 
precomputing  the jump array of pattern next[], 
when searching, do not need to backtrack i, when see a mismatch just set j to next[j].
A Naive Pattern Searching For Special Pattern: use the property of the special pattern (all characters in pattern are different), when searchingdo not need to backtrack i, when see a mismatch just set j to next[j].
Finite Automata: preconstructing the FA jump table, when searching, do not need to backtrack i, when see any character in txt[], just jump to a state accordingly. 
2.when we are searching, can we predict if txt[i...i+M-1] and pat[0...M-1] are possible to be matchable? The answer is yes: Rabin-Karp.
Rabin-Karp:  precomputing the hash value of txt[i...i+M-1] and pat[0...M-1], when searching, only when the hash value of the two mathcing string are same, we start matching individual characters.

KMP Algorithm 

Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the next sliding position. Let us see how we do that. When we compare pat[j] with txt[i] and see a mismatch, we know that characters pat[0..j-1] match with txt[i-j...i-1], and we also know that lps[j-1] characters of pat[0...j-1] are both proper prefix and suffix which means we do not need to match these lps[j-1] characters with txt[i-j...i-1] because we know that these characters will anyway match. 
Time Complexity: O(n)

Rabin-Karp Algorithm

Like the Naive Algorithm, Rabin-Karp algorithm also slides the pattern one by one. But unlike the Naive algorithm, Rabin Karp algorithm first matches the hash value of the pattern with the hash value of current substring(txt[i...i+m-1]) of text, and only if the hash values match then it starts matching individual characters.
Time Complexity: The average and best case running time of the Rabin-Karp algorithm is O(n+m), but its worst-case time is O(nm). Worst case of Rabin-Karp algorithm occurs when all characters of pattern and text are same as the hash values of all the substrings of txt[] match with hash value of pat[]. For example pat[] = “AAA” and txt[] = “AAAAAAA”.

A Naive Pattern Searching For Special Pattern

In the Naive Pattern Searching algo, we always slide the pattern by 1. When all characters of pattern are different, we can slide the pattern by more than 1. Let us see how can we do this. When we compare pat[j] with txt[i] and see a mismatch, we know that characters pat[0..j-1] match with txt[i-j...i-1], because every character in pat[] must be different, so pat[0] is different from every character in txt[i-j...i-1], then we only need to check start from txt[j] and pat[0].
Time complexity:  O(n).

Finite Automata

In FA (Finite Automata) based algorithm, we preprocess the pattern and build a 2D array that represents a Finite Automata. Construction of the FA is the main tricky part of this algorithm. Once the FA is built, the searching is simple. In search, we simply need to start from the first state of the automata and first character of the text. At every step, we consider next character of text, look for the next state in the built FA and move to new state. If we reach final state, then pattern is found in text.

Time complexity:  O(n) in searching, there exists O(m*NO_OF_CHARS)  algorithm  (Hint: we can use something like to construct the FA. 

References:

转载地址:http://maxti.baihongyu.com/

你可能感兴趣的文章
Android studio_迁移Eclipse项目到Android studio
查看>>
JavaScript setTimeout() clearTimeout() 方法
查看>>
CSS border 属性及用border画各种图形
查看>>
转载知乎-前端汇总资源
查看>>
JavaScript substr() 方法
查看>>
JavaScript slice() 方法
查看>>
JavaScript substring() 方法
查看>>
HTML 5 新的表单元素 datalist keygen output
查看>>
(转载)正确理解cookie和session机制原理
查看>>
jQuery ajax - ajax() 方法
查看>>
将有序数组转换为平衡二叉搜索树
查看>>
最长递增子序列
查看>>
从一列数中筛除尽可能少的数,使得从左往右看这些数是从小到大再从大到小...
查看>>
判断一个整数是否是回文数
查看>>
经典shell面试题整理
查看>>
腾讯的一道面试题—不用除法求数字乘积
查看>>
素数算法
查看>>
java多线程环境单例模式实现详解
查看>>
将一个数插入到有序的数列中,插入后的数列仍然有序
查看>>
在有序的数列中查找某数,若该数在此数列中,则输出它所在的位置,否则输出no found
查看>>