forked from microsoft/onnxruntime-extensions
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature extraction C API for whipser model (microsoft#755)
* Feature extraction C API for whipser model * Update the docs * Update the docs2 * refine the code * fix some issues * fix the Linux build * fix more data consistency issue * More code refinements
- Loading branch information
Showing
28 changed files
with
1,505 additions
and
515 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# ONNXRuntime Extensions C ABI | ||
|
||
ONNXRuntime Extensions provides a C-style ABI for pre-processing. It offers support for tokenization, image processing, speech feature extraction, and more. You can compile the ONNXRuntime Extensions as either a static library or a dynamic library to access these APIs. | ||
|
||
The C ABI header files are named `ortx_*.h` and can be found in the include folder. There are three types of data processing APIs available: | ||
|
||
- [`ortx_tokenizer.h`](../include/ortx_tokenizer.h): Provides tokenization for LLM models. | ||
- [`ortx_processor.h`](../include/ortx_processor.h): Offers image processing APIs for multimodels. | ||
- [`ortx_extraction.h`](../include/ortx_extractor.h): Provides speech feature extraction for audio data processing to assist the Whisper model. | ||
|
||
## ABI QuickStart | ||
|
||
Most APIs accept raw data inputs such as audio, image compressed binary formats, or UTF-8 encoded text for tokenization. | ||
|
||
**Tokenization:** You can create a tokenizer object using `OrtxCreateTokenizer` and then use the object to tokenize a text or decode the token ID into the text. A C-style code snippet is available [here](../test/pp_api_test/c_only_test.c). | ||
|
||
**Image processing:** `OrtxCreateProcessor` can create an image processor object from a pre-defined workflow in JSON format to process image files into a tensor-like data type. An example code snippet can be found [here](../test/pp_api_test/test_processor.cc#L75). | ||
|
||
**Audio feature extraction:** `OrtxCreateSpeechFeatureExtractor` creates a speech feature extractor to obtain log mel spectrum data as input for the Whisper model. An example code snippet can be found [here](../test/pp_api_test/test_feature_extractor.cc#L16). |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
// C ABI header file for the onnxruntime-extensions tokenization module | ||
|
||
#pragma once | ||
|
||
#include "ortx_utils.h" | ||
|
||
typedef OrtxObject OrtxFeatureExtractor; | ||
typedef OrtxObject OrtxRawAudios; | ||
typedef OrtxObject OrtxTensorResult; | ||
|
||
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
|
||
/** | ||
* @brief Creates a feature extractor object. | ||
* | ||
* This function creates a feature extractor object based on the provided feature definition. | ||
* | ||
* @param[out] extractor Pointer to a pointer to the created feature extractor object. | ||
* @param[in] fe_def The feature definition used to create the feature extractor. | ||
* | ||
* @return An error code indicating the result of the operation. | ||
*/ | ||
extError_t ORTX_API_CALL OrtxCreateSpeechFeatureExtractor(OrtxFeatureExtractor** extractor, const char* fe_def); | ||
|
||
/** | ||
* Loads a collection of audio files into memory. | ||
* | ||
* This function loads a collection of audio files specified by the `audio_paths` array | ||
* into memory and returns a pointer to the loaded audio data in the `audios` parameter. | ||
* | ||
* @param audios A pointer to a pointer that will be updated with the loaded audio data. | ||
* The caller is responsible for freeing the memory allocated for the audio data. | ||
* @param audio_paths An array of strings representing the paths to the audio files to be loaded. | ||
* @param num_audios The number of audio files to be loaded. | ||
* | ||
* @return An `extError_t` value indicating the success or failure of the operation. | ||
*/ | ||
extError_t ORTX_API_CALL OrtxLoadAudios(OrtxRawAudios** audios, const char* const* audio_paths, size_t num_audios); | ||
|
||
/** | ||
* @brief Creates an array of raw audio objects. | ||
* | ||
* This function creates an array of raw audio objects based on the provided data and sizes. | ||
* | ||
* @param audios Pointer to the variable that will hold the created raw audio objects. | ||
* @param data Array of pointers to the audio data. | ||
* @param sizes Array of pointers to the sizes of the audio data. | ||
* @param num_audios Number of audio objects to create. | ||
* | ||
* @return extError_t Error code indicating the success or failure of the operation. | ||
*/ | ||
extError_t ORTX_API_CALL OrtxCreateRawAudios(OrtxRawAudios** audios, const void* data[], const int64_t* sizes[], size_t num_audios); | ||
|
||
/** | ||
* @brief Calculates the log mel spectrogram for a given audio using the specified feature extractor. | ||
* | ||
* This function takes an instance of the OrtxFeatureExtractor struct, an instance of the OrtxRawAudios struct, | ||
* and a pointer to a OrtxTensorResult pointer. It calculates the log mel spectrogram for the given audio using | ||
* the specified feature extractor and stores the result in the provided log_mel pointer. | ||
* | ||
* @param extractor The feature extractor to use for calculating the log mel spectrogram. | ||
* @param audio The raw audio data to process. | ||
* @param log_mel A pointer to a OrtxTensorResult pointer where the result will be stored. | ||
* @return An extError_t value indicating the success or failure of the operation. | ||
*/ | ||
extError_t ORTX_API_CALL OrtxSpeechLogMel(OrtxFeatureExtractor* extractor, OrtxRawAudios* audio, OrtxTensorResult** log_mel); | ||
|
||
#ifdef __cplusplus | ||
} | ||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.