-
Notifications
You must be signed in to change notification settings - Fork 218
MultimodalQnA Image and Audio Support Phase 1 #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
…hbuehler/GenAIComps into melanie/combined_image_video_ingestion
…estion Image ingestion improvements
…omps into melanie/mm-rag-enhanced
* Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Image+Text Ingestion
Signed-off-by: Melanie Buehler <[email protected]>
Add two tests for ingest_with_text
* LVM Gaudi TGI update for prompts without images Signed-off-by: dmsuehir <[email protected]> * Wording Signed-off-by: dmsuehir <[email protected]> * Add a test Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mhbuehler and @dmsuehir for this PR!
Only a few minor comments.
Signed-off-by: dmsuehir <[email protected]>
ashahba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
I notice that you use LVM to generate captions for the video of each second. Will it takes too long? For example I have a 1min video and I have Others look good to me now. |
* Adds an endpoint for image ingestion Signed-off-by: Melanie Buehler <[email protected]> * Combined image and video endpoint Signed-off-by: Melanie Buehler <[email protected]> * Add test and update README Signed-off-by: Melanie Buehler <[email protected]> * fixed variable name for embedding model (opea-project#1) Signed-off-by: okhleif-IL <[email protected]> * Fixed test script Signed-off-by: Melanie Buehler <[email protected]> * Remove redundant function Signed-off-by: Melanie Buehler <[email protected]> * get_videos, delete_videos --> get_files, delete_files (opea-project#3) Signed-off-by: okhleif-IL <[email protected]> * Updates test per review feedback Signed-off-by: Melanie Buehler <[email protected]> * Fixed test Signed-off-by: Melanie Buehler <[email protected]> * Add support for audio files multimodal data ingestion (opea-project#4) * Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * Change videos_with_transcripts to ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * Add image support to video ingestion with transcript functionality Signed-off-by: Melanie Buehler <[email protected]> * Update test and README Signed-off-by: Melanie Buehler <[email protected]> * Updated for review suggestions Signed-off-by: Melanie Buehler <[email protected]> * Add two tests for ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * LVM TGI Gaudi update for prompts without images (opea-project#7) * LVM Gaudi TGI update for prompts without images Signed-off-by: dmsuehir <[email protected]> * Wording Signed-off-by: dmsuehir <[email protected]> * Add a test Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change dummy image to be b64 encoded instead of the url (opea-project#9) Signed-off-by: dmsuehir <[email protected]> * Updates based on review feedback (opea-project#10) Signed-off-by: dmsuehir <[email protected]> * Test fix (opea-project#11) Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: Melanie Buehler <[email protected]> Signed-off-by: okhleif-IL <[email protected]> Signed-off-by: dmsuehir <[email protected]> Co-authored-by: dmsuehir <[email protected]> Co-authored-by: Omar Khleif <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <[email protected]>
Description
This PR adds the following new features as specified in "Phase 1" of this RFC. The affected components include dataprep-multimodal-redis, embedding-multimodal-bridgetower, and lvm-llava. The related PR in GenAIExamples is opea-project/GenAIExamples#1071 and this one in GenAIComps will need to be merged before that one.
Data prep and ingestion enhancements:
Other enhancements:
Note that the planned query enhancement "Accept speech audio only" has been moved to Phase 2 and a PR for that phase will be submitted for the next release.
Issues
MultimodalQnA Image & Audio Support RFC
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
No new dependencies
Tests
Updated the individual microservice's test scripts, the GenAIExamples' MultimodalQnA test scripts, and did manual testing of the UI and documented curl commands.