First, install requirements and create a model_store
directory:
pip install -r requirements.txt
mdkir model_store
Now, build and save the model + tokenizer:
python build.py
Archive the model:
torch-model-archiver --model-name custom_model --version 1.0 --serialized-file ./pytorch_model.bin --handler ./handler.py --extra-files "./config.json,./special_tokens_map.json,./tokenizer.json,./tokenizer_config.json,./vocab.txt" --requirements-file ./requirements.txt && mv custom_model.mar model_store
Serve the model in Docker:
docker run --rm -it -p 8080:8080 -p 8082:8082 -v $(pwd)/model_store:/home/model-server/model-store pytorch/torchserve:latest-cpu torchserve --start --model-store model-store --models custom_model=custom_model.mar --ncs
Call the model to verify that it works. This should take a few seconds (slow). Please try it a couple of times:
curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/custom_model --data '{"instances": ["This is an example sentence"]}'
Now we build the custom container:
docker build -t torchserve-custom:latest .
Serve the container:
docker run --rm -it -p 8090:8080 -p 8092:8082 -v $(pwd)/model_store:/model-store torchserve-custom:latest torchserve --start --model-store model-store --models custom_model=custom_model.mar --ncs --foreground
Now create a request. This should respond much quicker than the other. Please try it a couple of times:
curl -v -H "Content-Type: application/json" http://localhost:8090/predictions/custom_model --data '{"instances": ["This is an example sentence"]}'