Binary Search Revisited

As ubiquitous as it is, the standard binary search algorithm can still be further optimized by using bit operations to iterate through its sorted list, in place of arithmetic. Admittedly, this is primarily an academic discussion, since the code improvement does not decrease the logarithmic complexity of the standard algorithm. Nevertheless, a well-developed programming intuition should by default implement (or at a minimum consider) a solution similar to the one presented here, prior to “the obvious” standard arithmetic solution.

Here’s an example. We want to find the largest index i such that haystack[i] <= needle. Needle = 15, and haystack is a sorted list of the first 8 prime numbers, indexed by binary numbers:

000     2
001     3
010     5
011     7
100     11
101     13
110     17
111     19

Note first that the haystack index requires no more than 3 bits. Therefore we start with the highest order bit set: b=100. (This font denotes binary numbers here.) Is haystack[100] <= 15? Yes, 11<=15. Observe this means that the first bit of the index we are looking for is 1. We look at the next bit. Is haystack[110] <= 15? No, 17 <= 15 is false. This means the 2nd bit is 0. Finally, is haystack[101] <= 15? Yes, 13 <= 15. Therefore the index we are looking for is 101.

In essence, the main loop to find the index i is simply:

    for( ; b ; b >>= 1 )
        if( haystack[i|b] <= needle ) i |= b;

where b is the value with 1 bit set, that began with b=100 and i=0.

It is the natural structure of the bits within the index variable that automatically tracks both the upper and lower bounds of the search window for each iteration. Compare this with the less efficient and more verbose arithmetic performed in the main loops of standard binary search algorithms.

The full C++ code for the improved binary search algorithm fbsearch() is:

// Binary search revisited.
 
// Define only one of these:
// #define SETBIT_FAST
// #define SETBIT_FASTER
#define SETBIT_FASTEST
 
#ifdef SETBIT_FAST
#include <math.h>
#endif
 
// Return 1 << log_2(list_size-1), or 0 if list_size == 1.
// This sets the initial value of b in fbsearch().
inline unsigned init_bit( unsigned list_size )
{
#ifdef SETBIT_FAST
    return list_size == 1 ? 0 :
        1 << int( log(list_size-1) / M_LN2 );
#endif
#ifdef SETBIT_FASTER
    return list_size == 1 ? 0 :
        1 << ( sizeof(unsigned) << 3 ) - 1
             - __builtin_clz(list_size-1);
#endif
#ifdef SETBIT_FASTEST
    unsigned b;
    __asm__ ( "decl %%eax;"
              "je DONE;"
              "bsrl %%eax, %%ecx;" // BSR - Bit Scan Reverse (386+)
              "movl $1, %%eax;"
              "shll %%cl, %%eax;"
              "DONE:" : "=a" (b) : "a" (list_size)
    );
    return b;
#endif
}
 
// Return the greatest unsigned i where haystack[i] <= needle.
// If i does not exist (haystack is empty, or needle < haystack[0])
// then return unsigned(-1). T can be any type for which the binary
// operator <= is defined.
template <typename T>
unsigned fbsearch( const T haystack[], unsigned haystack_size,
                   const T& needle )
{
    if( haystack_size == 0 ) return unsigned(-1);
    unsigned i = 0;
    for( unsigned b = init_bit(haystack_size) ; b ; b >>= 1 )
    {
        unsigned j = i | b;
        if( haystack_size <= j ) continue;
        if( haystack[j] <= needle ) i = j;
        else
        {
            for( b >>= 1 ; b ; b >>= 1 )
                if( haystack[i|b] <= needle ) i |= b;
            break;
        }
    }
    return i || *haystack <= needle ? i : unsigned(-1);
}
 
// Example Usage
#include <iostream>
using namespace std;
 
int main()
{
    const int sorted_list[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
    const unsigned list_size = sizeof(sorted_list)/sizeof(int);
    int needle = 15;
    cout << "fbsearch(sorted_list,"<<list_size<<','<<needle<<") = "
         << fbsearch(sorted_list,list_size,needle) << endl;
    // fbsearch(sorted_list,9,15) = 5
    return 0;
}

Ancillary notes:

  • The function init_bit() sets the initial value of b to having a single bit set in the highest position for indexing the array list. The three SETBIT_FAST, SETBIT_FASTER, and SETBIT_FASTEST code blocks within it are simply 3 different ways of calculating the initial value of b, whose speed ranks are the reverse of their cross-platform compatibility. The first method should compile everywhere, and simply calculates it from a base-2 logarithm. The second method uses the faster GNU __builtin_clz method (thanks Chao Xu for the suggestion) that counts the highest order bit from the left. The third and fastest method uses the Intel assembly instruction BSR that counts it from the right.
  • The null return value of unsigned(-1) (=4294967295 for 32-bit unsigned) is the unique value that will never otherwise be returned. Therefore it can safely be checked for by the calling function without risk of misinterpretation.
  • The one else block can safely be removed without changing the overall functionality. It exists to simply save having to check that the array index is in bounds on every iteration, since it is at a point in logic where that condition is guaranteed.
Share:
  • del.icio.us
  • Reddit
  • Technorati
  • Twitter
  • Facebook
  • Google Bookmarks
  • HackerNews
  • PDF
  • RSS
This entry was posted in code, programming and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • http://www.xcombinator.com/2011/01/21/worlds-fastest-binary-search/ World’s Fastest Binary Search?

    [...] (Edited 9 Sept 2011: A few minor improvements have been made; see the updated post instead: Binary Search Revisited.) [...]

  • Rafa

    What do you think about about the following algo?
    Searching for element k in sorted array A of size n.
    Lookup A[2^i] for i=0, 1, 2,… until you go beyond k’s position in A.
    then do a binary search on the part of the array left (smaller) than i.

    i=0;
    while (2^i<n && A[2^i]<k)
    {
    i++;
    }
    //then do binary search from (0..2^i)

    This algo will run on O(log idx) where idx is the index of k in A.
    (both steps are in log idx). In the worst case, the algo is in O(log
    n), if k is amongst the largest elements of A or bigger than any element
    of A. I suspect the multiplicative constant is larger than for binary search but
    that the algo would run faster for very large arrays and when looking for
    data that's towards the beginning of the array.

  • R

    What do you think about about the following algo?
    Searching for element k in sorted array A of size n.
    Lookup A[2^i] for i=0, 1, 2,… until you go beyond k’s position in A.
    then do a binary search on the part of the array left (smaller) than i.

    i=0;
    while (2^i<n && A[2^i]<k)
    {
    i++;
    }
    //then do binary search from (0..2^i)
    Sorry for the "trash" on the other post :(

  • Gustavo Trigueiros

    The algo does not work as expected. It skip data if the elements of the array is big. For example, try this sequence:

    const int sorted_list[] = {2, 3, 5, 7, 11, 13, 17 ,19, 23, 14, 15, 98, 99, 102, 857, 74, 658, 52, 14, 69, 74, 444, 4587, 125689, 6, 4447, 1235698}

    int needle = 4587;
    When he search for ’4587′, he will always return 7, due to the logarithm operation involved.
    So, it will never find it.

    Also, if you try with smaller values, such as if you look for “17″ it will return pos 5, instead of 6.

  • http://quantumlab.net Matt Pulver

    The list must be sorted.

  • Stefan Vorkoetter

    For some less-than-clever compilers, replacing the inner loop’s “for( b >>= 1; b; b >>= 1)” with “while( b >>= 1)” can result in a significant speed improvement.

  • Stefan Vorkoetter

    I know this is an old comment, but in case you’re still looking for an answer, the problem is that your sorted_list isn’t sorted. Binary search only works on ordered data.