Revisiting: Searching JavaScript arrays with a binary search

Last year I wrote a post called Searching JavaScript arrays with a binary search, it’s actually become quite popular and has seen a large amount of suggestions in the comments as well as helping out PowerArray. The problem is that it’s an untested, performance chasing, mess of a function. I’ve always wanted to redo this in a more formal manner because I don’t want people using code I’ve written that will probably break on multiple edge cases.

So, I’m going to build an actual repository containing a JavaScript binary search function as well as some robust tests. Step one will be to produce a simple reference implementation with benchmarks and generative tests using testcheck-js, then I will go on to continually speed up the algorithm without breaking that initial suite. I think this’ll produce something far better than my original post with the potential for more speed too. And it’s going to be bullet proof. I hope. Why would you be shooting at a searching algorithm anyway?

You can find the repository at Wolfy87/binary-search.

The baseline

So I built up a repository with tests and a function called binarySearch that actually uses indexOf instead. It’s lying to you, it’s not a binary search just yet. This provides us with a working example to base the tests against and a good baseline in performance. Here’s the function which basically does nothing of any significance.

And the test file that runs in 80ms on my machine using generative testing.

Now to turn it into an actual binary search. During this process I tried multiple binary search implementations including one from Khan Academy and a plethora of other blogs. Including mine from my previous post. The generative testing found holes in every single one, it was amazing yet terrifying. Even one apparently ported from JDK fell apart, although that’s probably the fault of the port and not in the actual JDK. I hope.

I eventually gave up with them all and went back to the implementation from Khan. It falls over on calls such as “( [0,0,0,13,2,2], 13 )” and “( [10,6], 6 )”, which is sort of bad.

This is the point where I realised I’m an idiot and slammed my face into my desk

There wasn’t anything wrong with the search functions. The sorting of the sample arrays was wrong. I noticed it after putting in some logging for failing cases that showed numbers being inserted out of order.

quadruple-facepalmSo using list.sort() in the tests wasn’t safe, amazingly. I guess it uses string comparison or something crazy like that by default. Thanks JavaScript! So I ended up with this binary search from my Khan academy attempt.

And these tests.

Now it’s safe

I am free to change the implementation now since I’m happy with the test suite (despite it subtly stabbing me in the back). So I can add in every crazy optimisation under the sun, but to be able to tell that it actually improved I’ll need some benchmarks. I’m going to use Benchmark.js.

Which produced this nice little output for me to compare against in the future.

I could go through inserting random optimisations now safe in the knowledge that I’ll be able to see improvements and I won’t break anything, but it’s almost midnight and I want to publish this tomorrow morning. Feel free to hack around in the repository and make it blisteringly fast without breaking anything.