Skip to content

Latest commit

 

History

History
141 lines (99 loc) · 4.69 KB

analysis.md

File metadata and controls

141 lines (99 loc) · 4.69 KB

Models Analysis

The Triton Model Navigator uses the Model Analyzer for performing analysis of profiled models according to provided constraints and objectives. The analysis step selects the top N model configurations across all prepared versions of models and applied optimizations.

The top N model configurations detailed report is being stored in {workspace_path}/analyze_report.pdf.

The analyze Command

The Triton Model Navigator analyze command runs the Triton Model Analyzer to evaluate results stored by profile stage.

Using CLI arguments:

$ model-navigator analyze --workspace-path navigator_workspace \
  --model-repository model-store \
  --max-latency-ms 100 \
  --min-throughput 750

Using YAML file:

workspace_path: navigator_workspace
model_repository: model-store
max_latency_ms: 100
min_throughput: 750

Running command using YAML configuration:

$ model-navigator analyze --config-path model_navigator.yaml

Constrains

The constraints are the limits in which the analyzed models should match. The default configuration does not set any constraints that models must match; therefore, the Triton Model Navigator returns all models sorted by the inference throughput.

If a model has to match a maximum latency budget or minimal performance, the flags with values should be passed to the Triton Model Navigator.

The Triton Model Navigator returns top N models matching the given constraints sorted by throughput.

Objectives

The top N models are sorted by throughput by default; however, the user can provide their own objectives based on which top N models are presented after the analysis.

The sort order can be changed by defining objectives based on which top N models should be selected and ordered in the final results:

objectives:
    - perf_latency
    - perf_throughput

The values can be weighted:

objectives:
    perf_latency: 10
    perf_throughput: 5

Learn more about Model Analyzer objectives here

CLI and YAML Config Options

# Path to the configuration file containing default parameter values to use. For more information about configuration
# files, refer to: https://github.com/triton-inference-server/model_navigator/blob/main/docs/run.md
[ config_path: path ]

# Path to the output workspace directory.
[ workspace_path: path | default: navigator_workspace ]

# Clean workspace directory before command execution.
[ override_workspace: boolean ]

# NVIDIA framework and Triton container version to use (refer to https://docs.nvidia.com/deeplearning/frameworks/support-
# matrix/index.html and https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html for
# details).
[ container_version: str | default: 21.07 ]

# Custom framework docker image to use. If not provided
# nvcr.io/nvidia/<framework>:<container_version>-<framework_and_python_version> will be used
[ framework_docker_image: str ]

# Custom Triton Inference Server docker image to use. If not provided nvcr.io/nvidia/tritonserver:<container_version>-py3
# will be used
[ triton_docker_image: str ]

# List of GPU UUIDs to be used for the conversion and/or profiling. Use 'all' to profile all the GPUs visible by CUDA.
[ gpus: str | default: ['all'] ]

# Provide verbose logs.
[ verbose: boolean ]

# Path to the Triton Model Repository.
[ model_repository: path | default: model-store ]

# Number of top final configurations selected from the analysis.
[ top_n_configs: integer | default: 3 ]

# The Model Navigator uses the objectives described here to find the best configuration for the model.
[ objectives: list[str] | default: ['perf_throughput=10'] ]

# Maximum latency in ms that the analyzed models should match.
[ max_latency_ms: integer ]

# Minimal throughput that the analyzed models should match.
[ min_throughput: integer | default: 1 ]

# Maximal GPU memory usage in MB that analyzed model should match.
[ max_gpu_usage_mb: integer ]