Last updated on: January 2nd, 2020
This is the main API for interacting with Soapbox's machine learning models. This documentation goes through setting up your local environment to deploying the entire application into production.
- Setup
- Managing dependencies using Poetry
- Run project checklist
- Key Components
- Adding a model and creating an endpoint
- Logging
- Creating custom exceptions
- Testing endpoints
- Testing on an instance, inside Docker
- Docker and deploying the app
- References/Guides
- FAQs
-
Install Python 3 using Homebrew package manager.
-
Download and install
docker
anddocker-compose
-
Download the pretrained model weights into
/api/models
folder:- Sentiment model:
sentiment.pt
(link)
- Sentiment model:
-
If the
config.py
file is not already located in/api
, download the file here and add it to the/api
folder.
For the production server, you need to update the database configuration to talk to Forge.
-
Go into the
ml-api
project directory. -
Run
docker-compose build
to build the project. Note: To build the project for production use, run the following command:
docker-compose build --build-arg USE_PRODUCTION_ENV=True api db
- Run the following set of commands before running the server to ensure that proper tables are setup:
- Run this in one terminal session:
docker-compose up db
- Run this in another terminal session:
docker-compose exec -T db mysql -u ml -psecret < api/database_models/reclassification_setup.sql
Finally, run docker-compose down
.
Note that this step is optional if the associated volume has already been setup with the proper tables.
-
Run
docker-compose up
to start the API and web server. -
Make sure to run the following command to ensure that git hooks are running on the pre-commit event:
git config core.hooksPath git_hooks
- If there are any code formatting issues, run the following command:
docker-compose run --rm api ./utils/check_code_format.sh
Note: This project requires PyTorch (~650 MB installer) and installing it will require at least 3 GB RAM, preferably more. This is installed automatically inside the Docker container, alongside other requirements.
If docker-compose build
is erroring out, you can add a swapfile, increase RAM in docker, or disable pip caching with --no-cache-dir
Optional: Project setup alias This alias will take care of building the Docker image and putting it up.
- Update the path to the project folder.
- Remove
git pull
if you do not want to pull. - Add this to your
~/.bash_profile
or~/.zshrc
:
alias buildapi='cd <path-to-ml-api-project> && docker-compose down && docker-compose build && docker-compose up'
- Source your bash profile or restart your session (to source your bash profile, run
source ~/.bash_profile
orsource ~/.zshrc
). - To use your new alias, run the following:
buildapi
Poetry is the main dependency package manager used in this project. There's an available Wiki on how to use Poetry.
To perform any operations using Poetry, such as installing or removing dependency, run the following:
docker exec -it ml-api /bin/bash -c "cd /api && poetry add|update|remove <dependency>
After running the operation, both pyproject.toml
and poetry.lock
should be changed and committed.
Every pull request in this project needs to pass the following checks implemented:
-
Project linting
- Any warnings and errors will be flagged as errors and will cause the CI system to fail.
- Address linting issues as specified by the linter
-
Unit tests
- All tests are found inside
test
. Ensure that when building new features that appropriate unit tests are added and all tests pass. - Note that as soon as a test case failure is detected the CI system will fail right away.
- All tests are found inside
The shell script which runs these checks is in utils/check_project.sh
. It is advised that this script is run everytime you work on any feature set
as this will be the source of truth for any linting errors or unit test failures in the CI system.
Lastly, a convenient script called utils/run_test.sh
takes a single argument of the Python unit test file's path so it runs the test cases in it.
Tip: To run any command inside your Docker container quickly, run the following:
docker exec ml-api <command>
In this case, Docker will run command
quickly and exit immediately. For example, replacing <command>
with ./utils/check-project.sh
will run the linter and unit test. ./utils/check-project.sh --no-linter
will only run the unit test (which is used for testing the ml-cron
container).
In addition, if you run the following command:
docker exec -it ml-api <command>
it will allow you to interact with the container itself. This is useful when debugging any issues.
-
Flask
- Flask is a Python library used to make REST APIs. The API is defined in
/api/app.py
- For more information about Flask, please visit this link
- Flask is a Python library used to make REST APIs. The API is defined in
-
Gunicorn:
- Gunicorn is a popular WSGI that works seamlessly with Flask.
- Flask needs a Web Server Gateway Interface (WSGI) to talk to a web server.
- Flask's built-in WSGI is not capable of handling production APIs, because it lacks security features and can only run one worker.
- In this project, Gunicorn will start automatically in the api Docker container with the following config (see
Dockerfile
):
[ "/bin/sh", "-c", "/usr/local/bin/wait.sh && gunicorn -w 1 -b :8000 -t 360 --reload api.wsgi:app" ]
- Note: The bash script,
wait.sh
, ensures that theml-db
container goes up before theml-api
container.
-
Add your model files to the
/models
folder. -
To create a new endpoint:
-
Create a new Python file in the
/endpoints
folder, with the following template:from flask import Blueprint, jsonify, request from api.exceptions.unprocessable_entity import UnprocessableEntity import json X_api = Blueprint('X_api', __name__) @X_api.route('/X', methods=['GET']) def get(): try: received_obj = json.loads(request.data) except json.decoder.JSONDecodeError: raise UnprocessableEntity('Unable to read json data. Please ensure that your data is correctly formatted.', status_code=422) return jsonify( {<object to return>} )
Here, we are creating a Blueprint. A Blueprint helps us connect the endpoint with Flask's main app (
app.py
). -
Replace
X
with the name of your endpoint -
The
@X_api.route(...
annotation is used to mark the following function as the function to be called when the endpoint is hit. The route specifies the path to query the endpoint. Note that the@
notation is Python feature called decorators. -
Inside the
get()
function,requests.data
holds the data object received in the request - a globally accessible property in available in a Flask application.
-
-
We can add this new endpoint to the
app.py
file using:from .endpoints.X import X_api app.register_blueprint(X_api)
-
At this point, you can add your functionality in the
get()
method. Theget()
function can be called anything, so you can change the name of the function as required. -
To add additional endpoints in the same file, create functions and add the
@X_api.route(...
annotation to them. For example, seeclassify.py
where we created separate single and bulk endpoints. -
/endpoints/util.py
contains some standalone methods that are shared across multiple endpoints
If you need to log specific information for debugging purposes, the logging instructions are outline for GUnicorn and Flask.
Add --log-level=debug
to GUnicorn startup in Dockerfile
:
["gunicorn", "-w", "1", "-b", ":8000", "-t", "360", "--reload", "api.wsgi:app", "--log-level=debug"]
You can log in app.py using:
app.logger.info("Your log message")
You can log in app.py using:
app.logger.info("Your log message")
For Blueprints, you can log information as follows:
from flask import current_app
current_app.logger.info("Your log message")
To create a custom exception similar to 422 - UnprocessableEntity
, follow the example of UnprocessableEntity
:
-
Create a new file in
/exceptions
with a class similar toUnprocessableEntity
-
Add the error handler to
app.py
, for example:from .exceptions.unprocessable_entity import UnprocessableEntity @app.errorhandler(UnprocessableEntity) def handle_invalid_usage(error): response = jsonify(error.to_dict()) response.status_code = error.status_code return response
-
Use it in your endpoints using the
try-except-raise
block. For example:from api.exceptions.unprocessable_entity import UnprocessableEntity try: sample = json.loads(request.data)["sample"] except json.decoder.JSONDecodeError: raise UnprocessableEntity('<message>', status_code=422)
We use Python's unittest
package for testing our code.
-
Create a new test file under the
/test
folder, with the following naming convention,test_X.py
, where X is the endpoint you want to test. -
Use this basic test template to get started:
import unittest from flask import Flask import json from api.endpoints.X import X_api app = Flask(__name__) app.register_blueprint(X_api) class XTests(unittest.TestCase): tester = None def __init__(self, *args, **kwargs): super(XTests, self).__init__(*args, **kwargs) global tester tester = app.test_client() def test_entity(self): # Use tester.get to simulate a GET request response = tester.get( '/X', data=json.dumps({<OBJECT TO SEND>}), content_type='application/json' ) data = json.loads(response.get_data(as_text=True)) self.assertEqual(response.status_code, 200) if __name__ == '__main__': unittest.main()
-
To run the test,
cd
to the home directory of the project (i.e.ml-api/
).- Running a single unit test, run the following command:
./utils/run_test.sh <name-of-unit-test-file>
- Running all unit tests, run the following command:
./utils/check_project.sh
Please note that this command will run the linter and unit tests found inside the
api/test
folder
Since our blog posts database is built into Docker, we cannot test the blogs
endpoint without going into the Docker instance.
To test the API:
- SSH into the instance
- Start a new shell session inside ml-api Docker using
docker exec -it ml-api /bin/bash
. - From the root directory of the Docker container, run
python run_tests.py
.
If all the tests pass, then you should see the following output: OK
We use Docker to deploy the API
docker-compose.yml
contains instructions to create two services:- The
ml-api
service Dockerizes theapi
folder using the Dockerfile in it. - The
ml-db
service for the local database where blog posts can be stored. Note that this is only useful in local environment. In production, blog posts are stored inside Forge database.
- The
docker-compose.yml
also creates a network connection between theml-api
andml-db
services/Dockerfile
in theapi
folder installs all the required packages fromrequirements.txt
and additional resources, as well as starts the GUnicorn WSGI on port8000
.
To deploy the API with changes logic only to the endpoints:
- Use
git pull
to pullmaster
and dynamically update the API. - The first time you hit the endpoint after updating it, the API will automatically rebuild, so it may take some time to complete the request.
To deploy the API when a new model/endpoint is added or docker-compose/Dockerfile is changed, it is better to rebuild the endpoint:
docker-compose down
to pull down the API.git pull
- (optional)
docker system prune -a
to reinstall requirements, if requirements changed. docker-compose build
to rebuild.docker-compose up
to bring it up again.
- Flask - WSGI - Gunicorn
- Deploying a scalable Flask app using Gunicorn and Nginx, in Docker
- Deploying Machine Learning Models with Docker
Q: Running python
on my local machine uses python2
. How can I fix this so it uses Python 3?
A: The macOS ships with Python 2 by default. As a result, you will have to install Python 3 through Homebrew package manager. So instead of typing python3
directly,
add an alias to python3
by adding the following to your ~/.bash_profile
:
alias python=python3
.
Finally, run source ~/.bash_profile
or restart your terminal application. To ensure it is linked properly, type python --version
to a terminal window and it should use a Python 3 version