Introduction
In string matching algorithm, we check whether a text T
contains a pattern P(length of T>P) i.e to check whether P is
a substring of T or not. Since we are trying to check a fixed
string P, sometimes these algorithms are called exact string
matching algorithms.
In horspool's method, we shift the pattern for matching by
looking at the character(say c) of the text that is aligned
against the last character of the pattern. This is the case
even if character c itself matches its counterpart in the
pattern. The pattern match checking occurs from right to left
in this method.
Working Procedure
Do refer the code available the end of this section to
understand the following theory.
-
Let m and n be the lengths of pattern P and text T
respectively. Initially the checking starts from mth
position in T with mth position of P(last letter of
P).
-
General Possibilities (Say char ch1 from text is being
compared with char ch2 from pattern).
-
If ch1 doesn't occur in T then we can safely shift the
pattern by it's length.
-
If ch1 doesn't match with ch2 but it has occurrences in
m-1 positions of P, then shift the pattern such that the
rightmost occurrence of ch1 in the pattern aligns with the
ch1 in the text.
-
If ch1 matches with ch2 and other characters don't match
and there are no other occurrences of ch1 in T then this
case is similar to first case i.e. shift the pattern by
it's entire length.
-
If ch1 matches with ch2 and other characters don't match
and there are other occurrences of ch1 in T then shift the
pattern similar to case 2.
-
Shift Table:
-
Initialize 0 to all 128 characters in an array, table[].
- Loop from i=0 to m-1
-
For the index, j=pattern[i]; assign m-i-1 in the table.
That is table[ pattern[i] ] = m-i-1.
This determines the amount of positions to shifted for
each character in the pattern based on the right most
occurrence of that character in the pattern
-
String Matching:
-
Loop from i=m-1 to n-1
-
Loop until from k=0 to m-1.
If pattern[m-k-1]==text[i-k]
then increment k,
else break the loop.
(This itself is the comparison from right to left.)
-
If k==m then the pattern was matched and the
position(= i-m+1) can be returned.
-
Else increment i by the value in table corresponding
to the current character (from the text) being
compared with the pattern.
-
If you arrived at this step then it means that the pattern
wasn't found, so return -1.
Time Complexity: O(n*m)