substring search – Boyer-Moore

I think single-string coding questions are evergreen. The insights and techniques might be …reusable?

Brute force search — shift by one, then attempt to match the entire substring. My code below is more efficient than brute-force. Mostly it shifts by one each time, but checks just one char.

Q: write a python function match(string haystack, string nonRegexNeedle), following the simplified Boyer-Moore algo:

Try matching at the left end. If mismatch, then identify the LEFT-most offending char (J) in haystack. Now locate the RIGHT-most position of J in needle (if not found then simply shift all the way past the J.) and right-shift the needle to align the two J.

Q: So when would we right-shift  by one? Not sure. So I don’t know if my code below is 100% correct in all cases.

```def visualize(haystack, substring, offset):
print '\t--' + '0123456789 123456789 12345-----'
print '\t--' + haystack
print '\t--' + ' '*offset + substring
print '\t--' + '-------------------------------'

def match(haystack, substring):
offset = 0 #offset from start of haystack, where substring is now
while(True):
#now locate first offender in haytack
offenderOffset, offender = -9999, ''
for i, char in enumerate(substring):
offenderOffset = offset+i
if substring[i] != haystack[offenderOffset]:
offender = haystack[offenderOffset]
print offender + ' <-- spotted at offenderOffset = ', offenderOffset
visualize(haystack, substring, offset)
break;

if offender == '': return True

# now back-scan substring to locate offender, and then shift to align
# not really forward-scan haystack
while True:
offset += 1
if offset + len(substring) > len(haystack):
print 'Gave up complately:'
visualize(haystack, substring, offset)
return False
if offenderOffset - offset == -1:
print 'gave up on aligning: '
visualize(haystack, substring, offset)
break
if substring[offenderOffset - offset] == offender:
print 'aligned: '
visualize(haystack, substring, offset)
break # go back to start of outer while loop

print match('aborb bdb', 'abors')
```