You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
When using the perf_analyser in a high throughput case (30 000+ QPS), it usually gets killed because of OOM issues. Another behavior that arises is that the measurements consolidations seem to take an exponential amount of time along the experiment.
During the load testing, server-side the queues stay empty, the resources are used way below what is allocated, nothing show any sign of contention that could explain some instability client side.
using tcmalloc has been tested without any change in behavior
CPU usage is around 5 cpu cores, memory consistently increase throughout the measurement session
Example of such a run (added time measurements for each pass in the log, to show exponential growth):
Runtime information
Running in py3-sdk offical container, version 24.06, on a Kube container with 16 Cpus and 32GB RAM.
The container running the perf_analyser is on a separate node from the triton server
To Reproduce
run a very simple model (we use a simple scalar cast as Onnx model) and load it with the following command: perf_analyzer -v -m <model_name> --percentile=75 --request-rate-range=35000:60000:2500 -a -u <server_address -i grpc --input-data random --measurement-interval 1000 --max-threads=32 --string-length=16 --stability-percentage=70
Expected behavior
The perf_analyser does't exponentially slow down when doing measurements and does not eventually crash
The text was updated successfully, but these errors were encountered:
jcuquemelle
changed the title
Memory leak when using pref_analyser at high throughput
Memory leak when using rerf_analyser at high throughput
Sep 17, 2024
jcuquemelle
changed the title
Memory leak when using rerf_analyser at high throughput
Memory leak when using perf_analyser at high throughput
Sep 17, 2024
Description
When using the perf_analyser in a high throughput case (30 000+ QPS), it usually gets killed because of OOM issues. Another behavior that arises is that the measurements consolidations seem to take an exponential amount of time along the experiment.
During the load testing, server-side the queues stay empty, the resources are used way below what is allocated, nothing show any sign of contention that could explain some instability client side.
using tcmalloc has been tested without any change in behavior
CPU usage is around 5 cpu cores, memory consistently increase throughout the measurement session
Example of such a run (added time measurements for each pass in the log, to show exponential growth):
Another run starting directly at 40000 QPS (where the process failed in the previous experiment) shows the following behavior (log abbreviated):
Runtime information
Running in py3-sdk offical container, version 24.06, on a Kube container with 16 Cpus and 32GB RAM.
The container running the perf_analyser is on a separate node from the triton server
To Reproduce
run a very simple model (we use a simple scalar cast as Onnx model) and load it with the following command:
perf_analyzer -v -m <model_name> --percentile=75 --request-rate-range=35000:60000:2500 -a -u <server_address -i grpc --input-data random --measurement-interval 1000 --max-threads=32 --string-length=16 --stability-percentage=70
Expected behavior
The perf_analyser does't exponentially slow down when doing measurements and does not eventually crash
The text was updated successfully, but these errors were encountered: