Boyer-Moore string search algorithm in Ruby

Just a quick post. I’ve converted the C code from the wikipedia entry (this version) on the Boyer-Moore string search algorithm to Ruby. I’ve extended it to support searches on token arrays and regular expressions.

You can find the code on github.

Usage:

    BoyerMoore.search(haystack, needle)   # returns index of needle or nil

Examples:

Basic search in string:

    BoyerMoore.search("ANPANMAN", "ANP")   # => 0
    BoyerMoore.search("ANPANMAN", "ANPXX") # => nil 
    BoyerMoore.search("foobar", "bar")     # => 3

You can also search an array of tokens:

    BoyerMoore.search(["<b>", "hi", "</b>"], ["hi"])         # => 1 
    BoyerMoore.search(["bam", "foo", "bar"], ["foo", "bar"]) # => 1 
    BoyerMoore.search(["bam", "bar", "baz"], ["foo"])        # => nil

A token can be a regular expression:

    BoyerMoore.search(["Sing", "99", "Luftballon"], [/\d+/]) == 1
    BoyerMoore.search(["Nate Murray", "5 Pine Street", "Los Angeles", "CA", "90210"], [/^\w{2}$/, /^\d{5}$/]) == 3

Notes:

The regular-expression token matching is a bit of a hack and will be fairly slow because every hash miss is compared against every regular expression key. You probably shouldn’t use the regular expression token search for anything more than a toy.

Download the Boyer-Moore string search algorithm in Ruby.

Share:
  • del.icio.us
  • Reddit
  • Technorati
  • Twitter
  • Facebook
  • Google Bookmarks
  • HackerNews
  • PDF
  • RSS
This entry was posted in programming. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.