Skip to content

Implementing redis proxy using a TTL + LRU cache in Python

Notifications You must be signed in to change notification settings

groeney/redis-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Redis Proxy Challenge (Segment): James Groeneveld

A super simple in-memory

High-level architecture overview

Components

  1. [client:] the client can talk to the Proxy web service over HTTP -- this is left for the user.

  2. proxy: the Proxy web service talks with the Redis service over HTTP. This is a containerized Flask application using Gunicorn as the WSGI to handle concurrency. For the purposes of this assignment we set only a single worker and cap the number of threads at 2 * num_cores on the proxy container. We spawn only a single worker process to ensure consistency with our RedisCache, otherwise we would have an instance of RedisCache inside each worker process.

    RedisCache: the Proxy web service offloads most of the heavy lifting onto this implementation of a RedisCache. It is an LRU Cache implementation with per-item time-to-live (TTL) setting based on cachetools package. We have overridden the _missing_ method to perform a roundtrip to our Redis backing instance if key is not currently in cache.

  3. e2e: a standalone end-to-end testing container as called out in requirements. Returns non-zero exit code if tests fail. If I had enough time I would have liked to write some unittests for my RedisCache subclass of TTLCache. The original author has decent test coverage so we should be OK for now.

  4. redis: vanilla containerized Redis -- redis:latest.

Configuration

Our configuration is set in .env.

  • REDIS_PORT (integer)

    [required]
    default: 6379
    A valid port number 1 <= REDIS_PORT <= 65535 for the host container mapping to the Redis container.

  • PROXY_PORT (integer)

    [required]
    default: 5000
    A valid port number 1 <= PROXY_PORT <= 65535 for the host container mapping to the Proxy web service container.

  • CACHE_TTL (integer)

    [optional]
    default: 3600
    Per-item ttl setting in our RedisCache

  • CACHE_MAXSIZE (integer)

    [optional]
    default: 1024
    Fixed key size setting for our RedisCache

  • MAX_PARALLEL (integer)

    [optional]
    default: 1
    Set to 1 for sequential concurrent processing otherwise > 1 for parallel concurrent processing. For safety we cap this setting with max_threads.

  • MAX_CONNECTIONS (integer)

    [optional]
    default: 2048
    Maximum number of concurrent connections to our web service at any given time.

What the code does

./proxy/app.py: Define the Flask application, instantiate an instance of RedisCache and set the main route -- /get/<key>.
./proxy/Dockerfile: Basic Dockerfile for Gunicorn backed Flask application.
./proxy/gunicorn.py: Defines the Gunicorn WSGI configuration.
./proxy/redis_cache.py: Extends the cachetools based implementation of TTLCache but specifying custom behaviour when key is not in cache.
./.env: single file for all configuration.
./docker-compose.yml: Defines our Proxy and Redis components. No need to setup any special networking as the default network and communication between containers is sufficient.
Makefile: defines test and run.

Algorithmic complexity of the cache operations

LRU

Cachetools uses a dictionary mapping to a doubly linked list under the hood. Because deletion and insertion into a doubly linked list and dictionary are both O(1) the complexity of the LRU implementation is as follows:

  • set: time -- O(1), space -- O(1)
  • get: time -- O(1), space -- O(1)
  • evict: time -- O(1), space -- O(1)

TTL

This implementation of TTL Cache will attempt to remove all expired keys on every state mutation. Because in the worst case we may need to expire all keys on any given mutation the complexity of the TTL implementation is as follows:

  • set: time -- O(n), space -- O(1)
  • get: time -- O(n), space -- O(1)
  • expire: time -- O(1), space -- O(1)

In practice, when we take TTL's set and get and amortize it over time we get an effective O(1) for each operation.

Instructions for how to run the proxy and tests

Run tests

make test

Run proxy (and redis)

make

How long you spent on each part of the project

  1. documentation: 1hr

  2. proxy: 1.5hrs

  3. e2e: 1hr

  4. redis: n/a

  5. misc: 3hrs

    Spent reading about best practices (e.g. flask, docker, max_threads) and certain trade offs e.g. treating e2e testing as it's own service for extensibility, rather than setting an environment variable and running tests inside the application container.

Unimplemented

  • Some unit tests may have been helpful (inside the proxy service).
  • RESP protocol for proxy

About

Implementing redis proxy using a TTL + LRU cache in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published