Implementing Bloom Filter in Javascript

Having finished Coursera’s Algorithms course recently, I looked for more interesting algorithms to implement. I found one data structure that can store large amount of data, use less memory than the competition, it’s implementation is quite short and concise, and gives an opportunity to practice some probability theory. That is the Bloom Filter.

You can use a Bloom Filter to answer a question similar to “did I see this item before?”, and you want the answer quickly—usually to avoid an otherwise costly operation to locate the item.

For example, imagine a server whose data is stored across the network. When getting a request, it’s worth checking whether the item even exists before performing a network call to retrieve the item.

Bloom Filter uses a bit array to know which items it has seen before. Each item is saved by having a few bits turned from 0 to 1. The exact bits are determined by running the item through a few hash function and converting the resulting hash number to a number from zero to n where n is the last bit.

The other data structure you might use for this purpose is a hash table. However, because a hash table needs to store the complete key, it uses a lot more memory. If we have 1 billion keys and each key is 255 bytes we will need at least 255GB. That’s more than most EC2 machines will have. However a comparable bloom filter will only need 12GB to store 1 billion keys.

Implementing Bloom Filter in Javascript came with additional surprises. Numbers in Javascript are stored as doubles. All bitwise operations involve converting the 32-bit integer, applying the operation and converting back. As a result, bitwise operations are not very fast. Also, unlike C++, left shift and right shift operations only use the last 5 bits of the operand, which results in: 1 << 128 = 1 In c++ the same operation would have resulted in 0.

My implementation can be found here. It uses murmurhash npm package for hashing the values.

When researching bloom filters, these articles were useful:


comments powered byDisqus