Skip to content

Commit 88d4948

Browse files
authored
Perf Analyzer C API documentation (triton-inference-server#2935)
Documentation for triton-inference-server#2861, and triton-inference-server/client#8
1 parent ce2e6a5 commit 88d4948

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

docs/perf_analyzer.md

+29-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
2+
# Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -400,3 +400,31 @@ can achieve by using shared memory in your application. Use
400400
By default perf_analyzer uses HTTP to communicate with Triton. The GRPC
401401
protocol can be specificed with the -i option. If GRPC is selected the
402402
--streaming option can also be specified for GRPC streaming.
403+
404+
## Benchmarking Triton directly via C API
405+
406+
Besides using HTTP or gRPC server endpoints to communicate with Triton, perf_analyzer also allows user to benchmark Triton directly using C API. HTTP/gRPC endpoints introduce an additional latency in the pipeline which may not be of interest to the user who is using Triton via C API within their application. Specifically, this feature is useful to benchmark bare minimum Triton without additional overheads from HTTP/gRPC communication.
407+
408+
### Prerequisite
409+
Pull the Triton SDK and the Inference Server container images on target machine.
410+
Since you will need access to the Tritonserver install, it might be easier if
411+
you copy the perf_analyzer binary to the Inference Server container.
412+
413+
### Required Parameters
414+
Use the --help option to see complete list of supported command line arguments.
415+
By default perf_analyzer expects the Triton instance to already be running. You can configure the C API mode using the `--service-kind` option. In additon, you will need to point
416+
perf_analyzer to the Triton server library path using the `--triton-server-directory` option and the model
417+
repository path using the `--model-repository` option.
418+
If the server is run successfully, there is a prompt: "server is alive!" and perf_analyzer will print the stats, as normal.
419+
An example run would look like:
420+
```
421+
perf_analyzer -m graphdef_int32_int32_int32 --service-kind=triton_c_api --triton-server-directory=/opt/tritonserver --model-repository=/workspace/qa/L0_perf_analyzer_capi/models
422+
```
423+
424+
### Non-supported functionalities
425+
There are a few functionalities that are missing from the C API. They are:
426+
1. Async mode (`-a`)
427+
2. Using shared memory mode (`--shared-memory=cuda` or `--shared-memory=system`)
428+
3. Request rate range mode
429+
4. For additonal known non-working cases, please refer to
430+
[qa/L0_perf_analyzer_capi/test.sh](https://github.com/triton-inference-server/server/blob/main/qa/L0_perf_analyzer_capi/test.sh#L239-L277)

0 commit comments

Comments
 (0)