New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

LFX Proposal: Multimodal Large Model Joint Learning Algorithm: Reproduction Based on KubeEdge-Ianvs #123 #163

Closed

aryan0931 wants to merge 1 commit into kubeedge:main from aryan0931:main

Contributor

aryan0931 commented Nov 12, 2024

What type of PR is this?
/kind design

What this PR does / why we need it:

Proposal for LFX Project CNCF - Multimodal Large Model Joint Learning Algorithm: Reproduction Based on KubeEdge-Ianvs

Which issue(s) this PR fixes:

Fixes #123


          proposal

c479a6a

kubeedge-bot added the kind/design label

Collaborator

kubeedge-bot commented Nov 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubeedge-bot requested review from jaypume and MooreZheng

November 12, 2024 03:20

Collaborator

kubeedge-bot commented Nov 12, 2024

Welcome @aryan0931! It looks like this is your first PR to kubeedge/ianvs 🎉

kubeedge-bot added the size/L label

MooreZheng requested changes

View reviewed changes

docs/proposals/algorithms/multimodal-large-model-joint-learning /proposal.md

+              **Implementation Detail**
+              ```plaintext
+              ├── testcasecontroller

Collaborator

MooreZheng Nov 14, 2024

Note that revisions to the core of ianvs, including controllers, are usually about adding new algorithm schemes, e.g., creating a scheme for lifelong learning.

In this proposal, since single-task learning exists for large models, my suggestion is to consider adding examples as a priority, i.e., a new example of single-task learning, instead of changing the core of ianvs. That can also release the burden of implementation and review, by avoiding the impact on other examples, without ianvs core revision.

docs/proposals/algorithms/multimodal-large-model-joint-learning /proposal.md

+              │   │       ├── base.py                      # Base class for algorithms
+              │   │       └── single_task_learning.py      # Single-task learning algorithms
+              │   │           └── clip_model.py            # Implementation of the CLIP model
+              │   ├── data_collection

Collaborator

MooreZheng Nov 14, 2024 •

edited

Loading

Recently, multi-modal data types are mostly supported. We might make better use of current data types, especailly under limited development time. Please refer to detailed comments on dataset handling below.

docs/proposals/algorithms/multimodal-large-model-joint-learning /proposal.md

+              │   │   ├── __init__.py
+              │   │   ├── multimodal_interface.py          # Interface for multimodal data collection
+              │   │   └── preprocess.py                     # Preprocessing for text, audio, and images
+              │   ├── benchmark

Collaborator

MooreZheng Nov 14, 2024

Benchmarks like metrics should be in examples instead of controllers. There are cases that metrics of the same name have different implementations in different scenarios, e.g., F1-score, BWT, etc.

docs/proposals/algorithms/multimodal-large-model-joint-learning /proposal.md Show resolved Hide resolved

docs/proposals/algorithms/multimodal-large-model-joint-learning /proposal.md

+                 **Adding New Enums in `DatasetFormat`:**
+                 ```python
+                 class DatasetFormat(Enum):

Collaborator

MooreZheng Nov 14, 2024 •

edited

Loading

Structured datasets are constructed using .csv. For Unstructured Data,

datasets of image and audio are constructed using data index, i.e., URL with .txt.
datasets of natural language are constructed using .jsonl

In the current stage, it is not a good idea to add more data types that need to change codes in sedna before ianvs. My suggestion is to make better use of the current implementation.

For your reference,

Unstructured Data implementation using .txt:

An image example is ready in Sedna federated learning
Video-input examples are ready in Sedna incremental learning and joint inference.

Unstructured Data implementation using .jsonl:

An NLP example from @IcyFeather233 is available in ianvs Single task learning for LLM with proposal and implementation.

When necessary, @aryan0931 might refer to @IcyFeather233 for more usage information on data types of ianvs LLM benchmarks. The implementation from @IcyFeather233 has already been successfully used in several members' projects merged in ianvs recently.

docs/proposals/algorithms/multimodal-large-model-joint-learning /proposal.md

+                    paradigms: [ "all" ]  # Selects all paradigms
+                    modules: [ "all" ]     # Selects all modules
+                    hyperparameters: [ "all" ]  # Selects all hyperparameters
+                    metrics:

Collaborator

MooreZheng Nov 14, 2024

As mentioned above, metrics should be implemented in examples to avoid impacts on others examples.

The usage of metrics is also in ianvs examples with testenv.yaml. An example is available in ianvs documents, as the following.

# testenv.yaml
testenv:
...

# metric used for model evaluation
model_metric:
  # metric name; string type;
  name: "f1_score"
  # the url address of python file
  url: "./examples/pcb-aoi/incremental_learning_bench/testenv/f1_score.py"

kubeedge-bot assigned MooreZheng

MooreZheng requested changes

View reviewed changes

Collaborator

MooreZheng left a comment •

edited

Loading

We see a DCO issue, which means the author of this commit failed to include a Signed-off-by line in the commit message.

Rebase is needed to fix this issue, see this link for more information

MooreZheng requested review from MooreZheng and hsj576 and removed request for jaypume

November 14, 2024 12:25

Contributor Author

aryan0931 commented Nov 16, 2024

sure sir I am working on it.

aryan0931 closed this by deleting the head repository

MooreZheng mentioned this pull request

Proposal for Multimodal Large Model Joint Learning Algorithm: Reproduction Based on KubeEdge-Ianvs #167

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/design size/L