Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion libraries/video-chunking-utils/docs/user-guide/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# A Python Module for Video Chunking Utils

<!--hide_directive
<div class="component_card_widget">
<a class="icon_github" href="https://github.com/open-edge-platform/edge-ai-libraries/tree/main/libraries/video-chunking-utils">
GitHub project
</a>
<a class="icon_document" href="https://github.com/open-edge-platform/edge-ai-libraries/blob/main/libraries/video-chunking-utils/README.md">
Readme
</a>
</div>
hide_directive-->

## Introduction

This is a Python module designed for video chunking. It allows users to split video files into smaller, manageable segments. The module is designed to be easily installable via pip and can be used in various applications such as video processing, analysis, and content delivery.
Expand Down Expand Up @@ -39,7 +50,7 @@ for i, micro_chunk in enumerate(micro_chunks_list):
print(f"Total {len(micro_chunks_list)} chunks are generated.")
```

### Method: Pelt Chunking
### Method: Pelt Chunking

```python
from video_chunking import PeltChunking
Expand Down
137 changes: 137 additions & 0 deletions microservices/audio-analyzer/docs/user-guide/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Audio Analyzer

<!--hide_directive
<div class="component_card_widget">
<a class="icon_github" href="https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/audio-analyzer">
GitHub project
</a>
<a class="icon_document" href="https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/audio-analyzer/README.md">
Readme
</a>
</div>
hide_directive-->

Audio Analyzer microservice is used to generate transcription of audio from video files.

## Overview

The Audio Analyzer microservice provides an automated solution for extracting and transcribing
audio from video files. Designed for seamless integration into modern AI pipelines this
microservice enables converting spoken content within videos into accurate, searchable text.
By leveraging state-of-the-art speech-to-text models, the service supports a wide range of
audio formats and languages, making it suitable for diverse applications such as video summary,
media analysis, compliance monitoring, and content indexing.

The microservice operates by first isolating the audio track from the input video file.
Once extracted, the audio is processed using advanced transcription models to generate a
time-aligned text transcript. This transcript can be used for downstream tasks such as keyword
search, sentiment analysis, or integration with other AI-driven analytics.

Key features include robust handling of noisy or low-quality audio, support for batch and
real-time processing, and easy deployment as a RESTful API. The service is optimized for edge
and cloud environments, ensuring low latency and scalability. Developers can interact with the
microservice through simple API endpoints, enabling rapid integration into existing workflows.

By automating the extraction and transcription of audio from video, the Audio Analyzer
microservice streamlines content analysis, improves accessibility, and unlocks new possibilities
for leveraging audio data in various video analytics use cases.

### Key Benefits

- **Benefit 1**: Enables multimodal analysis of video data by extracting information from its
audio track.
- **Benefit 2**: Seamless integration through RESTful APIs with various video analytics use
cases that benefit from audio processing.
- **Benefit 3**: Flexibility to use different ASR models as per use case requirements.

### Features

- **Feature 1**: Extract audio from video files.
- **Feature 2**: Transcribe speech using Whispercpp (CPU).
- **Feature 3**: RESTful API with FastAPI.
- **Feature 4**: Containerization with Docker.
- **Feature 5**: Automatic model download and conversion on startup.
- **Feature 6**: Persistent model storage.
- **Feature 7**: OpenVINO acceleration support for Intel hardware.
- **Feature 8**: **MinIO integration** for video source and transcript storage.

### Use Cases

Audio Analyzer microservice can be applied to various real-world use cases and scenarios across
different video analytics use cases cutting across different industry segments. The motivation
to provide the microservice primarily comes from enhancing the accuracy of the video summary
pipeline. Here are some examples:

- **Use case 1**: Ego centric videos as captured in industry segments like Safety and Security,
Body worn camera for example, benefits from additional modality of information that Audio
transcription provides.
- **Use case 2**: Videos from class rooms are primarily analysed using their audio content.
Audio Analyzer microservice helps provide transcription which can be used to chapterize a
class room session, for example.
- **Use case 3**: Courtroom or Legal Proceedings with legal hearings or depositions are
primarily analysed using the spoken word.
- **Use case 4**: Video podcasts or interview recordings where the value is in the conversation,
discussions, or interviews, and visuals are secondary.
- **Use case 5**: Events, like Panel Discussions and Debates, where multiple speakers discuss
or debate topics, the audio contains the key arguments and insights.

## How It Works

The Audio Analyzer microservice accepts a video file for transcription from either file system
or minIO storage. Using the configured Whisper model, the transcription is created. The output
transcription along with the configured metadata is then stored in configured destination
location. It provides a RESTful API to configure and utilize the capabilities.

## Models supported

The service automatically downloads and manages the required models based on configuration.
Two types of models are supported:

1. **GGML Models** Primarily used for inference on CPU using whispercpp backend.
2. **OpenVINO Models** Optimized for GPU inference on Intel GPUs.

Models are downloaded on application startup, converted to OpenVINO format if needed, and
stored in persistent volumes for reuse. The conversion process includes:

- Downloading the original Hugging Face Whisper model
- Converting the PyTorch model to OpenVINO format.
- Storing the encoder and decoder components separately for efficient inference

### Available Whisper Models

The following Whisper model variants are supported by the service (for both GGML and OpenVINO
formats):

| Model ID | Description | Size | Languages |
| --------- | ------------------ | -------- | ------------ |
| tiny | Tiny model | ~75M | Multilingual |
| tiny.en | Tiny model | ~75M | English-only |
| base | Base model | ~150M | Multilingual |
| base.en | Base model | ~150M | English-only |
| small | Small model | ~450M | Multilingual |
| small.en | Small model | ~450M | English-only |
| medium | Medium model | ~1.5GB | Multilingual |
| medium.en | Medium model | ~1.5GB | English-only |
| large-v1 | Large model (v1) | ~2.9GB | Multilingual |
| large-v2 | Large model (v2) | ~2.9GB | Multilingual |
| large-v3 | Large model (v3) | ~2.9GB | Multilingual |

## Supporting Resources

- [Get Started Guide](./get-started)
- [API Reference](./api-reference)
- [System Requirements](./system-requirements)

<!--hide_directive
:::{toctree}
:hidden:

overview-architecture
system-requirements
get-started
how-to-build-from-source
api-reference
release-notes

:::
hide_directive-->
98 changes: 0 additions & 98 deletions microservices/audio-analyzer/docs/user-guide/index.rst

This file was deleted.

105 changes: 105 additions & 0 deletions microservices/dlstreamer-pipeline-server/docs/user-guide/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Deep Learning Streamer Pipeline Server

<!--hide_directive
<div class="component_card_widget">
<a class="icon_github" href="https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/dlstreamer-pipeline-server">
GitHub project
</a>
<a class="icon_document" href="https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/dlstreamer-pipeline-server/README.md">
Readme
</a>
</div>
hide_directive-->

Deep Learning Streamer Pipeline Server (DL Streamer Pipeline Server) is a Python-based,
interoperable containerized microservice for easy development and deployment of video analytics
pipelines.

## Overview

DL Streamer Pipeline Server microservice is built on top of [GStreamer](https://gstreamer.freedesktop.org/documentation/)
and [Deep Learning Streamer (DL Streamer)](https://github.com/open-edge-platform/dlstreamer/tree/master),
providing video ingestion and deep learning inferencing functionalities.

Video analytics involves the conversion of video streams into valuable insights through the
application of video processing, inference, and analytics operations. It finds applications
in various business sectors including healthcare, retail, entertainment, and industrial domains.
The algorithms utilized in video analytics are responsible for performing tasks such as object
detection, classification, identification, counting, and tracking on the input video stream.

## How it Works

![DL Streamer Pipeline Server architecture](./images/dls-pipelineserver-simplified-arch.png)

Here is the high level description of functionality of DL Streamer Pipeline Server module:

- **RESTful Interface**

Exposes RESTful endpoints to discover, start, stop and customize pipelines in JSON format.

- **DL Streamer Pipeline Server Core**

Manages and processes the REST requests interfacing with the core DL Streamer Pipeline Server
components and Pipeline Server Library.

- **DL Streamer Pipeline Server Configuration handler**

Reads the contents of a config file and accordingly constructs/starts pipelines. Dynamic
configuration change is supported via REST API.

- **GST UDF Loader**

DL Streamer Pipeline Server provides a [GStreamer plugin](https://gstreamer.freedesktop.org/documentation/plugins_doc.html?gi-language=c) - `udfloader`, which can be used to configure and load arbitrary UDFs. With
`udfloader`, DL Streamer Pipeline Server provides an easy way to bring user developed programs
and run them as a part of GStreamer pipelines. A User Defined Function (UDF) is a chunk of
user code that can transform video frames and/or manipulate metadata. For example, a UDF can
act as filter, preprocessor, classifier or a detector. These User Defined Functions can be
developed in Python.

- **DL Streamer Pipeline Server Publisher**

Supports publishing metadata to a file, MQTT/Kafka message brokers and frame along with
metadata to a MQTT message broker. It also supports publishing metadata and frame over OPCUA.
The frames can also be saved on S3 compliant storage.

- **DL Streamer Pipeline Server Model Update**

Supports integration with the Model Registry service - [Model Registry](https://docs.openedgeplatform.intel.com/dev/edge-ai-libraries/model-registry/index.html) for model download, deployment and management.

- **Open Telemetry**

Supports gathering metrics over Open Telemetry for seamless visualization and analysis.

<!--hide_directive
:::{toctree}
:hidden:

overview-architecture
system-requirements
get-started
troubleshooting-guide
how-to-change-dlstreamer-pipeline
how-to-use-gpu-for-decode-and-inference
how-to-use-cpu-for-decode-and-inference
how-to-autostart-pipelines
how-to-launch-configurable-pipelines
how-to-perform-webrtc-frame-streaming
how-to-start-dlsps-mqtt-publish
how-to-store-s3-frame
how-to-store-metadata-influxdb
how-to-publish-metadata-over-ros2
how-to-launch-and-manage-pipeline
how-to-use-rtsp-camera-as-video-source
how-to-run-udf-pipelines
how-to-deploy-with-helm
how-to-use-image-file-as-source-over-request-payload
how-to-download-and-run-yolo-models
how-to-build-from-source
how-to-add-system-timestamps-to-metadata
api-reference
environment-variables
advanced-guide/Overview
release_notes/Overview

:::
hide_directive-->
Loading
Loading