Soapbox Machine Learning API

Last updated on: January 2nd, 2020

Overview

This is the main API for interacting with Soapbox's machine learning models. This documentation goes through setting up your local environment to deploying the entire application into production.

Install Python 3 using Homebrew package manager.
Download and install docker and docker-compose
- You can download docker here.
- The installation instructions for docker-compose can be found here.
- AWS instances may have docker pre-installed. In order to install docker-compose, use the following command: apt-get install docker-compose
Download the pretrained model weights into /api/models folder:
- Sentiment model: sentiment.pt (link)
If the config.py file is not already located in /api, download the file here and add it to the /api folder.

For the production server, you need to update the database configuration to talk to Forge.

Go into the ml-api project directory.
Run docker-compose build to build the project. Note: To build the project for production use, run the following command:

docker-compose build --build-arg USE_PRODUCTION_ENV=True api db

Run the following set of commands before running the server to ensure that proper tables are setup:

Run this in one terminal session:

docker-compose up db

Run this in another terminal session:

docker-compose exec -T db mysql -u ml -psecret < api/database_models/reclassification_setup.sql

Finally, run docker-compose down.

Note that this step is optional if the associated volume has already been setup with the proper tables.

Run docker-compose up to start the API and web server.
Make sure to run the following command to ensure that git hooks are running on the pre-commit event:

git config core.hooksPath git_hooks

If there are any code formatting issues, run the following command:

docker-compose run --rm api ./utils/check_code_format.sh

Note: This project requires PyTorch (~650 MB installer) and installing it will require at least 3 GB RAM, preferably more. This is installed automatically inside the Docker container, alongside other requirements.

If docker-compose build is erroring out, you can add a swapfile, increase RAM in docker, or disable pip caching with --no-cache-dir

Optional: Project setup alias This alias will take care of building the Docker image and putting it up.

Update the path to the project folder.
Remove git pull if you do not want to pull.
Add this to your ~/.bash_profile or ~/.zshrc:

alias buildapi='cd <path-to-ml-api-project> && docker-compose down && docker-compose build && docker-compose up'

Source your bash profile or restart your session (to source your bash profile, run source ~/.bash_profile or source ~/.zshrc).
To use your new alias, run the following:

buildapi

Managing dependencies using Poetry

Poetry is the main dependency package manager used in this project. There's an available Wiki on how to use Poetry.

To perform any operations using Poetry, such as installing or removing dependency, run the following:

docker exec -it ml-api /bin/bash -c "cd /api && poetry add|update|remove <dependency>

After running the operation, both pyproject.toml and poetry.lock should be changed and committed.

Run project checklist

Every pull request in this project needs to pass the following checks implemented:

Project linting
- Any warnings and errors will be flagged as errors and will cause the CI system to fail.
- Address linting issues as specified by the linter
Unit tests
- All tests are found inside test. Ensure that when building new features that appropriate unit tests are added and all tests pass.
- Note that as soon as a test case failure is detected the CI system will fail right away.

The shell script which runs these checks is in utils/check_project.sh. It is advised that this script is run everytime you work on any feature set as this will be the source of truth for any linting errors or unit test failures in the CI system.

Lastly, a convenient script called utils/run_test.sh takes a single argument of the Python unit test file's path so it runs the test cases in it.

Tip: To run any command inside your Docker container quickly, run the following:

docker exec ml-api <command>

In this case, Docker will run command quickly and exit immediately. For example, replacing <command> with ./utils/check-project.sh will run the linter and unit test. ./utils/check-project.sh --no-linter will only run the unit test (which is used for testing the ml-cron container).

In addition, if you run the following command:

docker exec -it ml-api <command>

it will allow you to interact with the container itself. This is useful when debugging any issues.

Key Components

Flask
- Flask is a Python library used to make REST APIs. The API is defined in /api/app.py
- For more information about Flask, please visit this link
Gunicorn:
- Gunicorn is a popular WSGI that works seamlessly with Flask.
- Flask needs a Web Server Gateway Interface (WSGI) to talk to a web server.
- Flask's built-in WSGI is not capable of handling production APIs, because it lacks security features and can only run one worker.
- In this project, Gunicorn will start automatically in the api Docker container with the following config (see Dockerfile):
```
 [ "/bin/sh", "-c", "/usr/local/bin/wait.sh && gunicorn -w 1 -b :8000 -t 360 --reload api.wsgi:app" ]
```
- Note: The bash script, wait.sh, ensures that the ml-db container goes up before the ml-api container.

Adding a model and creating an endpoint

Add your model files to the /models folder.
To create a new endpoint:
1. Create a new Python file in the /endpoints folder, with the following template:
```
from flask import Blueprint, jsonify, request
from api.exceptions.unprocessable_entity import UnprocessableEntity
import json

X_api = Blueprint('X_api', __name__)


@X_api.route('/X', methods=['GET'])
def get():
    try:
        received_obj = json.loads(request.data)
    except json.decoder.JSONDecodeError:
        raise UnprocessableEntity('Unable to read json data. Please ensure that your data is correctly formatted.', status_code=422)

    return jsonify(
        {<object to return>}
    )
```
  Here, we are creating a Blueprint. A Blueprint helps us connect the endpoint with Flask's main app (app.py).
2. Replace X with the name of your endpoint
3. The @X_api.route(... annotation is used to mark the following function as the function to be called when the endpoint is hit. The route specifies the path to query the endpoint. Note that the @ notation is Python feature called decorators.
4. Inside the get() function, requests.data holds the data object received in the request - a globally accessible property in available in a Flask application.

We can add this new endpoint to the app.py file using:

from .endpoints.X import X_api

app.register_blueprint(X_api)

At this point, you can add your functionality in the get() method. The get() function can be called anything, so you can change the name of the function as required.
To add additional endpoints in the same file, create functions and add the @X_api.route(... annotation to them. For example, see classify.py where we created separate single and bulk endpoints.
/endpoints/util.py contains some standalone methods that are shared across multiple endpoints

Logging

If you need to log specific information for debugging purposes, the logging instructions are outline for GUnicorn and Flask.

For GUnicorn

Add --log-level=debug to GUnicorn startup in Dockerfile:

["gunicorn", "-w", "1", "-b", ":8000", "-t", "360", "--reload", "api.wsgi:app", "--log-level=debug"]

You can log in app.py using:

app.logger.info("Your log message")

For Flask

You can log in app.py using:

app.logger.info("Your log message")

For Blueprints, you can log information as follows:

from flask import current_app
current_app.logger.info("Your log message")

Creating custom exceptions

To create a custom exception similar to 422 - UnprocessableEntity, follow the example of UnprocessableEntity:

Create a new file in /exceptions with a class similar to UnprocessableEntity

Add the error handler to app.py, for example:

from .exceptions.unprocessable_entity import UnprocessableEntity

@app.errorhandler(UnprocessableEntity)
   def handle_invalid_usage(error):
       response = jsonify(error.to_dict())
       response.status_code = error.status_code
       return response

Use it in your endpoints using the try-except-raise block. For example:

from api.exceptions.unprocessable_entity import UnprocessableEntity

try:
    sample = json.loads(request.data)["sample"]
except json.decoder.JSONDecodeError:
    raise UnprocessableEntity('<message>', status_code=422)

Testing endpoints

We use Python's unittest package for testing our code.

Create a new test file under the /test folder, with the following naming convention, test_X.py, where X is the endpoint you want to test.

Use this basic test template to get started:

import unittest
from flask import Flask
import json
from api.endpoints.X import X_api

app = Flask(__name__)
app.register_blueprint(X_api)


class XTests(unittest.TestCase):

    tester = None

    def __init__(self, *args, **kwargs):
        super(XTests, self).__init__(*args, **kwargs)
        global tester
        tester = app.test_client()

    def test_entity(self):
         # Use tester.get to simulate a GET request
        response = tester.get(
            '/X',
            data=json.dumps({<OBJECT TO SEND>}),
            content_type='application/json'
        )
        data = json.loads(response.get_data(as_text=True))
        self.assertEqual(response.status_code, 200)


if __name__ == '__main__':
    unittest.main()

To run the test, cd to the home directory of the project (i.e. ml-api/).
1. Running a single unit test, run the following command:
```
./utils/run_test.sh <name-of-unit-test-file>
```
1. Running all unit tests, run the following command:
```
./utils/check_project.sh
```
Please note that this command will run the linter and unit tests found inside the api/test folder

Testing on an instance, inside Docker

Since our blog posts database is built into Docker, we cannot test the blogs endpoint without going into the Docker instance.

To test the API:

SSH into the instance
Start a new shell session inside ml-api Docker using docker exec -it ml-api /bin/bash.
From the root directory of the Docker container, run python run_tests.py.

If all the tests pass, then you should see the following output: OK

Docker and deploying the API

We use Docker to deploy the API

docker-compose.yml contains instructions to create two services:
- The ml-api service Dockerizes the api folder using the Dockerfile in it.
- The ml-db service for the local database where blog posts can be stored. Note that this is only useful in local environment. In production, blog posts are stored inside Forge database.
docker-compose.yml also creates a network connection between the ml-api and ml-db services/
Dockerfile in the api folder installs all the required packages from requirements.txt and additional resources, as well as starts the GUnicorn WSGI on port 8000.

To deploy the API with changes logic only to the endpoints:

Use git pull to pull master and dynamically update the API.
The first time you hit the endpoint after updating it, the API will automatically rebuild, so it may take some time to complete the request.

To deploy the API when a new model/endpoint is added or docker-compose/Dockerfile is changed, it is better to rebuild the endpoint:

docker-compose down to pull down the API.
git pull
(optional) docker system prune -a to reinstall requirements, if requirements changed.
docker-compose build to rebuild.
docker-compose up to bring it up again.

References/Guides:

FAQs:

Q: Running python on my local machine uses python2. How can I fix this so it uses Python 3? A: The macOS ships with Python 2 by default. As a result, you will have to install Python 3 through Homebrew package manager. So instead of typing python3 directly, add an alias to python3 by adding the following to your ~/.bash_profile: alias python=python3.

Finally, run source ~/.bash_profile or restart your terminal application. To ensure it is linked properly, type python --version to a terminal window and it should use a Python 3 version

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
api		api
git_hooks		git_hooks
utils		utils
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soapbox Machine Learning API

Overview

Table of contents

Setup

Managing dependencies using Poetry

Run project checklist

Key Components

Adding a model and creating an endpoint

Logging

For GUnicorn

For Flask

Creating custom exceptions

Testing endpoints

Testing on an instance, inside Docker

Docker and deploying the API

References/Guides:

FAQs:

About

Releases

Packages

Languages

Soapbox/ml-gateway

Folders and files

Latest commit

History

Repository files navigation

Soapbox Machine Learning API

Overview

Table of contents

Setup

Managing dependencies using Poetry

Run project checklist

Key Components

Adding a model and creating an endpoint

Logging

For GUnicorn

For Flask

Creating custom exceptions

Testing endpoints

Testing on an instance, inside Docker

Docker and deploying the API

References/Guides:

FAQs:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages