Add configurable latency optimization for sub-second end-to-end latency #881

jcork-intel · 2025-11-26T18:58:42Z

Summary

This PR adds configurable latency optimization to achieve sub-second end-to-end latency on the full obj_detection_age_prediction pipeline.

Key Changes

Queue Optimization (src/pipelines/obj_detection_age_prediction.sh)
- Added LOW_LATENCY mode: max-size-buffers=3 max-size-time=100000000 (0.1s)
- Added MEDIUM_LATENCY mode: max-size-buffers=10 max-size-time=500000000 (0.5s)
- Default queues can buffer up to 200 frames / 1 second per queue, adding significant latency
Configurable Inference Interval
- Added INFERENCE_INTERVAL environment variable (default=3)
- Set to 1 to process every frame for minimum latency
Docker-Compose Environment Passthrough (Critical fix)
- Shell environment variables were being ignored because docker-compose loads .env directly
- Added explicit passthrough for: LOW_LATENCY, MEDIUM_LATENCY, INFERENCE_INTERVAL, BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY
NPU+GPU Hybrid Configuration (src/res/npu-gpu-flip.env)
- YOLO11n detection on NPU
- Classification on GPU with VA surface sharing
- Separate inference options for each device type

Benchmark Results (Lunar Lake)

Metric	Default Config	Latency-Optimized
Single-stream latency	~1,509 ms	~957 ms
Max streams @ 14.95 FPS	22	25
Avg latency at max streams	1,264 ms	840 ms

Usage

# Run with latency optimization
LOW_LATENCY=1 INFERENCE_INTERVAL=1 \
  PIPELINE_SCRIPT=obj_detection_age_prediction.sh \
  DEVICE_ENV=res/npu-gpu-flip.env \
  make run-demo

Files Changed

src/pipelines/obj_detection_age_prediction.sh - Queue optimization logic
src/docker-compose.yml - Environment variable passthrough
src/docker-compose-reg.yml - Environment variable passthrough (registry version)
src/res/npu-gpu-flip.env - New NPU+GPU hybrid device configuration

Test Plan

Tested on Lunar Lake with NPU+GPU configuration
Verified LOW_LATENCY=1 achieves ~957ms single-stream latency
Verified stream density of 25 streams @ 14.95 FPS target
Test on other platforms (Meteor Lake, Raptor Lake)

…tency - Add LOW_LATENCY and MEDIUM_LATENCY queue optimization modes - Add configurable INFERENCE_INTERVAL (default=3, use 1 for every frame) - Add separate inference options for face detection vs object detection - Add support for INT8 model paths for NPU compatibility

Allow shell environment variables to override .env file defaults for: - LOW_LATENCY, MEDIUM_LATENCY - INFERENCE_INTERVAL - BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY Also add sample-media volume mount for benchmarking.

…try version) Allow shell environment variables to override .env file defaults for: - LOW_LATENCY, MEDIUM_LATENCY - INFERENCE_INTERVAL - BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY Also add sample-media volume mount and update image name.

New device configuration for Lunar Lake that runs: - YOLO11n object detection on NPU - EfficientNet classification on GPU with VA surface sharing - Face detection and age classification on GPU This configuration achieves sub-second latency while maximizing stream density.

Latency benchmarks were run using a locally-built image (pipeline-runner-asc) based on DLStreamer 2025.0.1 with Intel NPU drivers for Lunar Lake.

jcork-intel

@sachinkaushik @avinash-palleti --> I created this PR just so you could see side by side the changes that I made in my fork when running my experiments.

jcork-intel added 5 commits November 26, 2025 11:55

Add environment variable passthrough for latency configuration

3064ef3

Allow shell environment variables to override .env file defaults for: - LOW_LATENCY, MEDIUM_LATENCY - INFERENCE_INTERVAL - BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY Also add sample-media volume mount for benchmarking.

Revert to original image name, add comment about tested configuration

9ab8ee1

Latency benchmarks were run using a locally-built image (pipeline-runner-asc) based on DLStreamer 2025.0.1 with Intel NPU drivers for Lunar Lake.

jcork-intel commented Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add configurable latency optimization for sub-second end-to-end latency #881

Add configurable latency optimization for sub-second end-to-end latency #881

Uh oh!

jcork-intel commented Nov 26, 2025

Uh oh!

jcork-intel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add configurable latency optimization for sub-second end-to-end latency #881

Are you sure you want to change the base?

Add configurable latency optimization for sub-second end-to-end latency #881

Uh oh!

Conversation

jcork-intel commented Nov 26, 2025

Summary

Key Changes

Benchmark Results (Lunar Lake)

Usage

Files Changed

Test Plan

Uh oh!

jcork-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants