substring search – Boyer-Moore

I think single-string coding questions are evergreen. The insights and techniques might be …reusable?

Brute force search — shift by one, then attempt to match the entire substring. My code below is more efficient than brute-force. Mostly it shifts by one each time, but checks just one char.

Q: write a python function match(string haystack, string nonRegexNeedle), following the simplified Boyer-Moore algo:

Try matching at the left end. If mismatch, then identify the LEFT-most offending char (J) in haystack. Now locate the RIGHT-most position of J in needle (if not found then simply shift all the way past the J.) and right-shift the needle to align the two J.

Q: So when would we right-shift  by one? Not sure. So I don’t know if my code below is 100% correct in all cases.

def visualize(haystack, substring, offset):
  print '\t--' + '0123456789 123456789 12345-----'
  print '\t--' + haystack
  print '\t--' + ' '*offset + substring
  print '\t--' + '-------------------------------'

def match(haystack, substring):
  offset = 0 #offset from start of haystack, where substring is now
  while(True):
    #now locate first offender in haytack
    offenderOffset, offender = -9999, ''
    for i, char in enumerate(substring):
       offenderOffset = offset+i
       if substring[i] != haystack[offenderOffset]:
         offender = haystack[offenderOffset]
         print offender + ' <-- spotted at offenderOffset = ', offenderOffset
         visualize(haystack, substring, offset)       
         break;
         
    if offender == '': return True
    
    # now back-scan substring to locate offender, and then shift to align
    # not really forward-scan haystack
    while True:
      offset += 1
      if offset + len(substring) > len(haystack):
           print 'Gave up complately:'
           visualize(haystack, substring, offset)       
           return False
      if offenderOffset - offset == -1: 
        print 'gave up on aligning: '
        visualize(haystack, substring, offset)
        break
      if substring[offenderOffset - offset] == offender:
        print 'aligned: '
        visualize(haystack, substring, offset)
        break # go back to start of outer while loop

print match('aborb bdb', 'abors') 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s