Searching JavaScript arrays with a binary search

I migrated from Octopress back to WordPress and all of my code snippets exploded, I’m in the process of fixing those, please bear with me. This post also has huge improvements detailed in the comments section that makes me think this post warrants a revisit, I’ll link to it from here if so. I’ve written a new post! This one includes benchmarks and generative testing (that all passes!). So you can be sure that it works and that any changes made do actually make it faster. This post is therefore deprecated. The repository for the new implementation can be found at Wolfy87/binary-search.


We are asking browsers to do more and more as they become more capable. I’m not sure if that’s a good thing or not, but that’s a heated flame war that should be saved for another day.

There may come a time in your browser (ab)use in which you need to search a ridiculous amount or values within a cat gif processing platform. Or maybe you need to search through all of the elements in a bloated DOM.

You will probably find that indexOf doesn’t quite cut it in those situations. In fact, if you do happen survive the inevitable violent explosion caused by your CPU trying to take the easy way out, you might want to try a JavaScript search method that is better suited for that much data.

This is where you have an excuse to suggest a binary search and blow everyone else’s minds.

What is it

A binary search searches by splitting your array into smaller and smaller chunks until it finds your desired value. Unlike the normal indexOf which searches from left to right in a simple iteration. The binary search Wikipedia article explains it best (as always). There are a couple of downsides; It will be slower with smaller data sets (this needs proving) and the array you are searching needs to be sorted.

Because a binary search is O(log n), and not O(n) like indexOf, it’s great for large sets of data. I set up a little jsPerf for my version of the JavaScript implementation; it is searching 100,000 numbers.

The source for my JavaScript binary search implementation is currently held in a gist. The really cool thing about this is that it’s only 138 bytes when minified, that’s tiny enough to fit inside a tweet.

Edit: I’ve swapped Math.floor for number | 0 at Yehonatan’s reccomendation. It’s faster sometimes.

Real world use case

Okay, you might not want to search an array of numbers. Say you were working on a JavaScript map implementation, as I will be talking about in my next post, you might need to search an array of objects. As long as this array contains some kind of number you can use as an index whilst sorting, then it can be done.

Say we have a Model class that each have a numerical ID or publish time. The first thing you would need to do is sort them. You can either do this with a sorting function like this…

Or you can do it the smart way which kills two birds with one stone. You define a valueOf method which returns the ID. This value can then be used by the default sort method and, more importantly, within the binaryIndexOf method which needs the comparison operators to work correctly in order to find things.

So now that your models are sorted and their greater than and less than comparison operators will work, you can use the binaryIndexOf method on it to search through your 100,000 models. Why are you even doing this to a browser, you monster.

Let me know what you think and if you have any improvements or suggestions. I hope this can help to destroy a bottleneck or two (or 100,000).

Edit: Where should you stick it?

doomslice over on reddit suggested that I return the twos compliment in place of -1. As far as I can tell, this means returning the negative version of the last place that was checked. So if you wanted to insert a value and wanted to know where you should put it, you could run the function and use the returned number to splice the value into the array.

This way you can add items without ruining the order. You don’t exactly want to go re-sorting potentially thousands of values every time you add something. Here’s the modified function I came up with, as well as a working example. It demonstrates finding where to insert an element and then inserting it.

As you can see, because I use the bitwise NOT operator (~, also suggested by doomsice!) on the index during the splice, it will always insert the element in the right place, even if there is an identical value already there! Pretty cool. You do have to be careful though, now you are going to need to check for index < 0, and not index === -1 when you are looking for existence of a value.

  • NoName

    var a = [1, 2, 3]

    console.log(a.indexOf(3)); // 2

    console.log(binaryIndexOf.call(a, 3)); // -1

    • http://oli.me.uk/ Oliver Caldwell

      How odd. I just ran your exact same snippet and it returned 2 for both. What browser are you using? What does this page display? http://jsfiddle.net/Wolfy87/sqP96/

      • pixelBender67

        I get 2 on Chrome

        • http://oli.me.uk/ Oliver Caldwell

          That’s exactly what it should be. No idea what “NoName” was doing to
          break such a simple loop. I’m fairly confident it would be fine in the
          older versions of IE too. There’s nothing magical going on.

  • http://arnorhs.com/ Arnor Heidar Sigurdsson

    There’s a node.js module for doing a binary search on a sorted array: https://npmjs.org/package/binary-search

    I built a sorted set that uses that module: https://npmjs.org/package/sset

    I also built a sorted array that uses the sorted set internally in dumb way (doesn’t try to be an array, though): https://npmjs.org/package/sarray

    • http://oli.me.uk/ Oliver Caldwell

      Interesting. I was planning on writing something about maps and potentially sets too. I’m trying to do things that I need to learn to plug the gaps in my self taught knowledge whilst helping people. My aim with all of these implementations is to write them with as little overhead as possible.

      So they wouldn’t be used as a library, which will probably be more powerful and fully tested, but could be dropped into a class as a helper method. Basically I want to focus on the core parts of algorithms and patterns. If I do anything on sets I’ll probably drop in a reference to your modules, they look great.

  • Yehonatan

    Instead of Math.floor(num) you can use num | 0

  • Emilio

    Very usefull and well written! Thanks!

  • Xiaodong Tan

    Actually ~maxIndex is incorrect if currentElement > searchElement on the last check. A simple test with an array like [‘CA’, ‘WA’] against ‘TX’ will show that maxIndex is 0. The correct answer is 1, not 0.
    The return value should be ~max(minIndex, maxIndex).

    • Guest

      or just ~minIndex

  • TRON

    It’s usually better and easier to work with half open intervals, the same is true for binary halving. Here is a binary search function that does that, additionally this returns the right insert index for the item even if it wasnt found in the array:
    function binary_search(list, item, cmp_func) {
    var lo = 0;
    var hi = list.length;
    if (!cmp_func) {
    cmp_func = function(a,b) { return (ab) ? 1 : 0) }
    }
    while (lo < hi) {
    var mid = ((lo + hi) / 2) | 0;
    var cmp_res = cmp_func(item, list[mid]);
    if (cmp_res == 0)
    {
    return {
    found: true,
    index: mid
    };
    } else if (cmp_res < 0) {
    hi = mid;
    } else {
    lo = mid + 1;
    }
    }
    return {
    found: false,
    index: hi
    };
    }

  • J

    Yes, as Xiaodong Tan said, with Oliver’s code, return “~Math.max(minIndex, maxIndex)” (or use the ternary operator [?:] instead of Math.max) instead of “~maxIndex” at the end. Also, changing the division by 2 to “>> 1″ makes the calculation much faster. Finally, the “resultIndex” variable isn’t used, so it can be deleted.
    Thank you.

  • Tantaman

    Should be ~currentIndex not ~maxIndex.

    Example:
    var a = [1, 2, 3];
    a.binaryIndexOf(0);

    With your current code, the above will give you 0. It should be -1 since 0 wasn’t in the array and needs to be inserted into the 0th index.

    a.binaryIndexOf(1.5); will give you -1. It should be -2 since 1.5 should be inserted into the array at index 1.

    The magnitude of the returned missing index should always be 1 greater than the actual insertion index otherwise you can’t decide between if something is missing in the array or needs to be inserted as the first element in the array.

    In other words, if the missing element should be inserted at index 0 you should return -1. If it should be inserted at index 1, you need to return -2 and so on. If you don’t do this you confuse the case of “the array is missing the element which belongs at index 0″ and “the array found the element at index 0″

    • Guest

      Wrong again! It should be ~minIndex

  • uberbrady

    First off – thank you so much for this nifty piece of code!

    I had to use it for something so I ran it and put together some tests (using Mocha). I took a lot of the advice from the comments –

    – I took Xiadong Tan’s recommendation for returning ~max – that seemed to work correctly for me.

    – I took “J’s” note about using rightshift (>>) instead of dividing by zero and flooring the result. Makes sense – and, well, my tests still pass, so it must be OK :)

    Here’s the gist with the reworked function, and the tests:

    https://gist.github.com/uberbrady/10605041

    Thanks again for this useful code!

  • Howard

    From Howard
    I edit-tidied your code for a job interview.
    Two things
    1) there is a boundary error on data items where the last two integers > 2^30; (it’s not low +high /2 see the code below.
    2) The code doesn’t handle repeating values. It likely should give the first index (and a length) of the binary search e.g. if the data is [1,2,2,2,2,2,2,2,2,2,3] … your algorithm will not return 1 (the 2nd item). I assumed a relatively unique set of data so I implemented a linear range scan. A second binary search could also work faster if only the first item is sought, though it’s also more useful to know the count of the items.
    Thanks
    [code]

    function binaryIndexOf(searchElement) {
    ‘use strict';

    var minIndex = 0;
    var maxIndex = this.length – 1;
    var currentIndex;
    var currentElement;
    var resultIndex;

    while (minIndex <= maxIndex) {
    // resultIndex = currentIndex = (minIndex + maxIndex) / 2 | 0;
    resultIndex = currentIndex = minIndex+ ( maxIndex-minIndex) / 2 | 0;
    currentElement = this[currentIndex];

    if (currentElement searchElement) {
    maxIndex = currentIndex – 1;
    }
    else {
    // I am assuming relative uniqueness in the array
    // so a linear scan back from a found item is likely best
    // (we could also do another back log n to find first (low uniqueness or general purpose)
    while (this[currentIndex-1]==searchElement) {
    currentIndex–;
    }

    return currentIndex;
    }
    }

    return -1;
    }

    Array.prototype.binaryIndexOf = binaryIndexOf;
    var arr = [0, 1, 2, 6, 6, 6 , 6, 6, 6.5, 7, 8, 9];
    //arr.splice(Math.abs(arr.binaryIndexOf(3)), 0, 3);
    //document.body.textContent = JSON.stringify(arr);
    document.writeln(arr.binaryIndexOf(6));
    document.writeln(arr[0]);

    [/code]

  • Howard

    corrected….

    function binaryIndexOf(searchElement) {

    // JLH >> Should be Find the
    starting matching elements index.

    ‘use
    strict';

    var
    minIndex = 0;

    var
    maxIndex = this.length – 1;

    var
    currentIndex;

    var
    currentElement;

    while
    (minIndex 2^30

    //Corrected below

    currentIndex = minIndex+ ( maxIndex-minIndex)
    / 2 | 0;

    currentElement
    = this[currentIndex];

    if
    (currentElement searchElement) {

    maxIndex
    = currentIndex – 1;

    }

    else
    {

    //
    JLH I am assuming relative uniqueness in the array

    //
    so a linear scan back from a found item is likely best

    //
    (we could also do another back log n to find first (low uniqueness or general
    purpose)

    while
    (this[currentIndex-1]==searchElement) {

    currentIndex–;

    }

    return
    currentIndex;

    }

    }

    return
    -1;

    }

    Array.prototype.binaryIndexOf =
    binaryIndexOf;

    var arr = [0, 1, 2, 6, 6, 6 , 6, 6, 6.5,
    7, 8, 9];

    document.writeln(arr.binaryIndexOf(6));

    //document.writeln(arr[0]);

    The
    above will more correctly return 3 as the index of the first matching item and
    will not have a boundary (carry) error on very large data elements (e.g.
    {…,2^31-2,2^31-1} or (2^53) in 64 bit. 64 bit numbers aren’t recommended as
    most people use a 32 bit browser.

    The
    problem has an improperly stated requirement, it requires the returns a single
    item when a generic solution should return the index of first and a count (or
    as implemented the index of the first item). Both problem in the internet
    version are from badly stated requirements.

  • August

    Wow I love both the information and the diction of this article.

  • Pingback: PowerArray, Atomus - InfoLogs()

  • Pingback: DailyJS | ネイティブのArrayより5倍高速な PowerArray | JSお散歩()

  • Pingback: PowerArray, Atomus | 8tut.com()