This example shows how to implement
auto_complete_config
function in Python backend to provide
max_batch_size
,
input
and output
properties. These properties will allow Triton to load the Python model with
Minimal Model Configuration
in absence of a configuration file.
The model repository should contain nobatch_auto_complete, and batch_auto_complete models. The max_batch_size of nobatch_auto_complete model is set to zero, whereas the max_batch_size of batch_auto_complete model is set to 4. For models with a non-zero value of max_batch_size, the configuration can specify a different value of max_batch_size as long as it does not exceed the value set in the model file.
The
nobatch_auto_complete and
batch_auto_complete models calculate the sum and difference
of the INPUT0
and INPUT1
and put the results in OUTPUT0
and OUTPUT1
respectively.
- Create the model repository:
mkdir -p models/nobatch_auto_complete/1/
mkdir -p models/batch_auto_complete/1/
# Copy the Python models
cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
Note that we don't need a model configuration file since Triton will use the auto-complete model configuration provided in the Python model.
- Start the tritonserver:
tritonserver --model-repository `pwd`/models
Send inference requests using client.py.
python3 examples/auto_complete/client.py
You should see an output similar to the output below:
'nobatch_auto_complete' configuration matches the expected auto complete configuration
'batch_auto_complete' configuration matches the expected auto complete configuration
PASS: auto_complete
The nobatch_model.py and batch_model.py
model files are heavily commented with explanations about how to utilize
set_max_batch_size
, add_input
, and add_output
functions to set
max_batch_size
, input
and output
properties of the model.
For each model, the client.py first requests the model configuration from Triton to validate if the model configuration has been registered as expected. The client then sends an inference request to verify whether the inference has run properly and the result is correct.