Skip to content

Conversation

@jcork-intel
Copy link

Summary

This PR adds configurable latency optimization to achieve sub-second end-to-end latency on the full obj_detection_age_prediction pipeline.

Key Changes

  1. Queue Optimization (src/pipelines/obj_detection_age_prediction.sh)

    • Added LOW_LATENCY mode: max-size-buffers=3 max-size-time=100000000 (0.1s)
    • Added MEDIUM_LATENCY mode: max-size-buffers=10 max-size-time=500000000 (0.5s)
    • Default queues can buffer up to 200 frames / 1 second per queue, adding significant latency
  2. Configurable Inference Interval

    • Added INFERENCE_INTERVAL environment variable (default=3)
    • Set to 1 to process every frame for minimum latency
  3. Docker-Compose Environment Passthrough (Critical fix)

    • Shell environment variables were being ignored because docker-compose loads .env directly
    • Added explicit passthrough for: LOW_LATENCY, MEDIUM_LATENCY, INFERENCE_INTERVAL, BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY
  4. NPU+GPU Hybrid Configuration (src/res/npu-gpu-flip.env)

    • YOLO11n detection on NPU
    • Classification on GPU with VA surface sharing
    • Separate inference options for each device type

Benchmark Results (Lunar Lake)

Metric Default Config Latency-Optimized
Single-stream latency ~1,509 ms ~957 ms
Max streams @ 14.95 FPS 22 25
Avg latency at max streams 1,264 ms 840 ms

Usage

# Run with latency optimization
LOW_LATENCY=1 INFERENCE_INTERVAL=1 \
  PIPELINE_SCRIPT=obj_detection_age_prediction.sh \
  DEVICE_ENV=res/npu-gpu-flip.env \
  make run-demo

Files Changed

  • src/pipelines/obj_detection_age_prediction.sh - Queue optimization logic
  • src/docker-compose.yml - Environment variable passthrough
  • src/docker-compose-reg.yml - Environment variable passthrough (registry version)
  • src/res/npu-gpu-flip.env - New NPU+GPU hybrid device configuration

Test Plan

  • Tested on Lunar Lake with NPU+GPU configuration
  • Verified LOW_LATENCY=1 achieves ~957ms single-stream latency
  • Verified stream density of 25 streams @ 14.95 FPS target
  • Test on other platforms (Meteor Lake, Raptor Lake)

…tency

- Add LOW_LATENCY and MEDIUM_LATENCY queue optimization modes
- Add configurable INFERENCE_INTERVAL (default=3, use 1 for every frame)
- Add separate inference options for face detection vs object detection
- Add support for INT8 model paths for NPU compatibility
Allow shell environment variables to override .env file defaults for:
- LOW_LATENCY, MEDIUM_LATENCY
- INFERENCE_INTERVAL
- BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY

Also add sample-media volume mount for benchmarking.
…try version)

Allow shell environment variables to override .env file defaults for:
- LOW_LATENCY, MEDIUM_LATENCY
- INFERENCE_INTERVAL
- BATCH_SIZE_DETECT, BATCH_SIZE_CLASSIFY

Also add sample-media volume mount and update image name.
New device configuration for Lunar Lake that runs:
- YOLO11n object detection on NPU
- EfficientNet classification on GPU with VA surface sharing
- Face detection and age classification on GPU

This configuration achieves sub-second latency while maximizing stream density.
Latency benchmarks were run using a locally-built image (pipeline-runner-asc)
based on DLStreamer 2025.0.1 with Intel NPU drivers for Lunar Lake.
Copy link
Author

@jcork-intel jcork-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sachinkaushik @avinash-palleti --> I created this PR just so you could see side by side the changes that I made in my fork when running my experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants