MultimodalQnA Image and Audio Support Phase 1 #852

mhbuehler · 2024-11-05T00:44:09Z

Description

This PR adds the following new features as specified in "Phase 1" of this RFC. The affected components include dataprep-multimodal-redis, embedding-multimodal-bridgetower, and lvm-llava. The related PR in GenAIExamples is opea-project/GenAIExamples#1071 and this one in GenAIComps will need to be merged before that one.

Data prep and ingestion enhancements:

Accept image only
Accept image and text
Accept speech audio only

Other enhancements:

Allow the user to choose the embedding model and LVM when starting the services

Note that the planned query enhancement "Accept speech audio only" has been moved to Phase 2 and a PR for that phase will be submitted for the next release.

Issues

MultimodalQnA Image & Audio Support RFC

Type of change

List the type of change like below. Please delete options that are not relevant.

New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

No new dependencies

Tests

Updated the individual microservice's test scripts, the GenAIExamples' MultimodalQnA test scripts, and did manual testing of the UI and documented curl commands.

Signed-off-by: Melanie Buehler <[email protected]>

…m-rag-enhanced

Signed-off-by: okhleif-IL <[email protected]>

Signed-off-by: Melanie Buehler <[email protected]>

Signed-off-by: okhleif-IL <[email protected]>

Signed-off-by: Melanie Buehler <[email protected]>

…deo_ingestion

Signed-off-by: Melanie Buehler <[email protected]>

…hbuehler/GenAIComps into melanie/combined_image_video_ingestion

…estion Image ingestion improvements

…m-rag-enhanced

…omps into melanie/mm-rag-enhanced

* Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]>

Signed-off-by: Melanie Buehler <[email protected]>

…m-rag-enhanced

Signed-off-by: Melanie Buehler <[email protected]>

Image+Text Ingestion

Signed-off-by: Melanie Buehler <[email protected]>

Add two tests for ingest_with_text

* LVM Gaudi TGI update for prompts without images Signed-off-by: dmsuehir <[email protected]> * Wording Signed-off-by: dmsuehir <[email protected]> * Add a test Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]>

…m-rag-enhanced

for more information, see https://pre-commit.ci

ashahba

Thanks @mhbuehler and @dmsuehir for this PR!
Only a few minor comments.

comps/dataprep/multimodal/redis/langchain/README.md

comps/dataprep/multimodal/redis/langchain/config.py

comps/lvms/tgi-llava/lvm_tgi.py

tests/dataprep/test_dataprep_multimodal_redis_langchain.sh

Signed-off-by: dmsuehir <[email protected]>

…m-rag-enhanced

Signed-off-by: dmsuehir <[email protected]>

ashahba

LGTM!

Signed-off-by: dmsuehir <[email protected]>

Spycsh · 2024-11-08T02:11:14Z

I notice that you use LVM to generate captions for the video of each second. Will it takes too long? For example I have a 1min video and I have key_frame_per_second set to 1, so I would got 60 inference through the LVM with the prompt "Provide a short description for this scene", right? This is actually just an image captioning task which can be solved extremely fast by some other smaller image-caption models. Maybe implement this method as an improvement if dataprep process is slow?

Others look good to me now.

* Adds an endpoint for image ingestion Signed-off-by: Melanie Buehler <[email protected]> * Combined image and video endpoint Signed-off-by: Melanie Buehler <[email protected]> * Add test and update README Signed-off-by: Melanie Buehler <[email protected]> * fixed variable name for embedding model (opea-project#1) Signed-off-by: okhleif-IL <[email protected]> * Fixed test script Signed-off-by: Melanie Buehler <[email protected]> * Remove redundant function Signed-off-by: Melanie Buehler <[email protected]> * get_videos, delete_videos --> get_files, delete_files (opea-project#3) Signed-off-by: okhleif-IL <[email protected]> * Updates test per review feedback Signed-off-by: Melanie Buehler <[email protected]> * Fixed test Signed-off-by: Melanie Buehler <[email protected]> * Add support for audio files multimodal data ingestion (opea-project#4) * Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * Change videos_with_transcripts to ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * Add image support to video ingestion with transcript functionality Signed-off-by: Melanie Buehler <[email protected]> * Update test and README Signed-off-by: Melanie Buehler <[email protected]> * Updated for review suggestions Signed-off-by: Melanie Buehler <[email protected]> * Add two tests for ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * LVM TGI Gaudi update for prompts without images (opea-project#7) * LVM Gaudi TGI update for prompts without images Signed-off-by: dmsuehir <[email protected]> * Wording Signed-off-by: dmsuehir <[email protected]> * Add a test Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change dummy image to be b64 encoded instead of the url (opea-project#9) Signed-off-by: dmsuehir <[email protected]> * Updates based on review feedback (opea-project#10) Signed-off-by: dmsuehir <[email protected]> * Test fix (opea-project#11) Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: Melanie Buehler <[email protected]> Signed-off-by: okhleif-IL <[email protected]> Signed-off-by: dmsuehir <[email protected]> Co-authored-by: dmsuehir <[email protected]> Co-authored-by: Omar Khleif <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <[email protected]>

mhbuehler and others added 26 commits October 14, 2024 16:33

Adds an endpoint for image ingestion

a330569

Signed-off-by: Melanie Buehler <[email protected]>

Combined image and video endpoint

ac82cec

Signed-off-by: Melanie Buehler <[email protected]>

Add test and update README

83225e6

Signed-off-by: Melanie Buehler <[email protected]>

Merge branch 'main' of github.com:mhbuehler/GenAIComps into melanie/m…

eb46ff9

…m-rag-enhanced

fixed variable name for embedding model (#1)

6324418

Signed-off-by: okhleif-IL <[email protected]>

Fixed test script

3320b48

Signed-off-by: Melanie Buehler <[email protected]>

Remove redundant function

007aba8

Signed-off-by: Melanie Buehler <[email protected]>

get_videos, delete_videos --> get_files, delete_files (#3)

7abee3e

Signed-off-by: okhleif-IL <[email protected]>

Updates test per review feedback

313b344

Signed-off-by: Melanie Buehler <[email protected]>

Merge branch 'melanie/mm-rag-enhanced' into melanie/combined_image_vi…

0df7c08

…deo_ingestion

Fixed test

f620193

Signed-off-by: Melanie Buehler <[email protected]>

Merge branch 'melanie/combined_image_video_ingestion' of github.com:m…

89203cf

…hbuehler/GenAIComps into melanie/combined_image_video_ingestion

Merge pull request #2 from mhbuehler/melanie/combined_image_video_ing…

a51f0aa

…estion Image ingestion improvements

Merge branch 'main' of github.com:mhbuehler/GenAIComps into melanie/m…

d9cb3cf

…m-rag-enhanced

Merge branch 'melanie/mm-rag-enhanced' of github.com:mhbuehler/GenAIC…

3684a5f

…omps into melanie/mm-rag-enhanced

Add support for audio files multimodal data ingestion (#4)

476591b

* Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]>

Change videos_with_transcripts to ingest_with_text

689e9d8

Signed-off-by: Melanie Buehler <[email protected]>

Add image support to video ingestion with transcript functionality

01dc2e7

Signed-off-by: Melanie Buehler <[email protected]>

Update test and README

7769107

Signed-off-by: Melanie Buehler <[email protected]>

Merge branch 'main' of github.com:mhbuehler/GenAIComps into melanie/m…

f1028ca

…m-rag-enhanced

Updated for review suggestions

c49f5d7

Signed-off-by: Melanie Buehler <[email protected]>

Merge pull request #5 from mhbuehler/melanie/images_and_text

2f9688c

Image+Text Ingestion

Add two tests for ingest_with_text

4b253ee

Signed-off-by: Melanie Buehler <[email protected]>

Merge pull request #6 from mhbuehler/melanie/negative_test

47c77d5

Add two tests for ingest_with_text

Merge branch 'main' of github.com:mhbuehler/GenAIComps into melanie/m…

56c4d09

…m-rag-enhanced

mhbuehler requested a review from lvliang-intel as a code owner November 5, 2024 00:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

ccc4be2

for more information, see https://pre-commit.ci

mhbuehler mentioned this pull request Nov 5, 2024

MultimodalQnA Image and Audio Support Phase 1 opea-project/GenAIExamples#1071

Merged

3 tasks

ashahba self-requested a review November 5, 2024 17:39

Merge branch 'main' into melanie/mm-rag-enhanced

808700d

ashahba added the r1.1 label Nov 5, 2024

ashahba added this to the v1.1 milestone Nov 5, 2024

ashahba requested changes Nov 5, 2024

View reviewed changes

ashahba and others added 2 commits November 5, 2024 22:42

Merge branch 'main' into melanie/mm-rag-enhanced

6eea8cc

Change dummy image to be b64 encoded instead of the url (#9)

97f26bf

Signed-off-by: dmsuehir <[email protected]>

mhbuehler mentioned this pull request Nov 6, 2024

Clone specific branch of GenAIComps for tests mhbuehler/GenAIExamples#16

Merged

1 task

dmsuehir added 2 commits November 6, 2024 11:42

Merge branch 'main' of github.com:mhbuehler/GenAIComps into melanie/m…

8e4f145

…m-rag-enhanced

Updates based on review feedback (#10)

a9b4d0f

Signed-off-by: dmsuehir <[email protected]>

ashahba approved these changes Nov 6, 2024

View reviewed changes

dmsuehir and others added 3 commits November 6, 2024 13:56

Test fix (#11)

c2d1ebe

Signed-off-by: dmsuehir <[email protected]>

Merge branch 'main' into melanie/mm-rag-enhanced

c817274

Merge branch 'main' into melanie/mm-rag-enhanced

e7b43b2

lvliang-intel requested a review from Spycsh November 8, 2024 01:53

lvliang-intel approved these changes Nov 8, 2024

View reviewed changes

Spycsh approved these changes Nov 8, 2024

View reviewed changes

Spycsh merged commit 29ef642 into opea-project:main Nov 8, 2024

This was referenced Nov 8, 2024

Revert repo change opea-project/GenAIExamples#1094

Closed

Revert change of repo for tests mhbuehler/GenAIExamples#21

Merged

joshuayao linked an issue Nov 8, 2024 that may be closed by this pull request

Image and Audio Support for MultimodalityQnA #790

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MultimodalQnA Image and Audio Support Phase 1 #852

MultimodalQnA Image and Audio Support Phase 1 #852

Uh oh!

mhbuehler commented Nov 5, 2024 •

edited

Loading

Uh oh!

ashahba left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashahba left a comment

Uh oh!

Spycsh commented Nov 8, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

MultimodalQnA Image and Audio Support Phase 1 #852

MultimodalQnA Image and Audio Support Phase 1 #852

Uh oh!

Conversation

mhbuehler commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

ashahba left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashahba left a comment

Choose a reason for hiding this comment

Uh oh!

Spycsh commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mhbuehler commented Nov 5, 2024 •

edited

Loading

ashahba left a comment •

edited

Loading

Spycsh commented Nov 8, 2024 •

edited

Loading