invalidator

invalidator is a bloom filter sitting behind a rest api. It stores information about the presence (and absence) of strings in its backing store.

Using the rest API one can:

feed lists of strings to the filter, it will store information about their "presence" in its backing store.
query the filter using lists of strings, it will respond with the strings missing from its backing store.

The major advantage here is that the backing store is very small, since we do not actually store the strings in memory, we just store information related to their presence in a bit vector.

A bloom filter is guaranteed to return a correct negative (element not in set) response. If it returns a positive response, then there is a small chance that the response may be incorrect and the element may actually be in the set.

This property makes a bloom filter a good tool to pare down large suspect data sets from invalid data. For example, it can be placed in front of a database to remove invalid entries before hitting the database with a potentially expensive query.

Prerequisites

To build this project you need the rust toolchain installed, follow the instructions at https://rustup.rs/ for your operating system.

Building and running

Clone this repository and run cargo build (cargo is part of the rust toolchain). Then run the binary generated:

./target/debug/invalidator

Or simply

cargo run

Rest API

There are two endpoints:

/push
/check

Both accept a json payload as POST which should contain a single field: keys

This field should be a list of strings.

/push takes the supplied keys and adds them to the internal bloom filter. Later queries can be run against these keys.

/check takes the supplied keys and checks them against the filter. The response payload contains two fields:

missing_entries: the keys which were not in the filter

matched_entries: the keys which were in the filter

Bloom filter design:

The bloom filter uses a bit vector (bit-vec) as its backing store.

The filter is sized as a power of two because this way we can simplify the mod operation (which is very slow) to a bitwise AND.

metro hash is used for computing the first two hashes of the supplied key. These two hashes are used to compute subsequent positions for a key.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
benches		benches
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

invalidator

Prerequisites

Building and running

Rest API

Bloom filter design:

About

Releases

Packages

Languages

License

abhijat/invalidator

Folders and files

Latest commit

History

Repository files navigation

invalidator

Prerequisites

Building and running

Rest API

Bloom filter design:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages