diff --git a/README.md b/README.md index 10e3c6e..182c03e 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,3 @@ -

maestro

@@ -6,34 +5,26 @@
- - - - - - - -

@@ -44,10 +35,11 @@ ## Hello -**maestro** is a tool designed to streamline and accelerate the fine-tuning process for -multimodal models. It provides ready-to-use recipes for fine-tuning popular -vision-language models (VLMs) such as **Florence-2**, **PaliGemma 2**, and -**Qwen2.5-VL** on downstream vision-language tasks. +**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models. +By encapsulating best practices from our core modules, maestro handles configuration, +data loading, reproducibility, and training loop setup. It currently offers ready-to-use +recipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and +**Qwen2.5-VL**. ![maestro](https://github.com/user-attachments/assets/3bb9ccba-b0ee-4964-bcd6-f71124a08bc2) @@ -55,18 +47,21 @@ vision-language models (VLMs) such as **Florence-2**, **PaliGemma 2**, and ### Install -To get started with maestro, you’ll need to install the dependencies specific to the model you wish to fine-tune. +To begin, install the model-specific dependencies. Since some models may have clashing requirements, +we recommend creating a dedicated Python environment for each model. ```bash -pip install maestro[qwen_2_5_vl] +pip install maestro[paligemma_2] ``` -**Note:** Some models may have clashing dependencies. We recommend creating a separate python environment for each model to avoid version conflicts. - ### CLI +Kick off fine-tuning with our command-line interface, which leverages the configuration +and training routines defined in each model’s core module. Simply specify key parameters such as +the dataset location, number of epochs, batch size, optimization strategy, and metrics. + ```bash -maestro qwen_2_5_vl train \ +maestro paligemma_2 train \ --dataset "dataset/location" \ --epochs 10 \ --batch-size 4 \ @@ -76,8 +71,13 @@ maestro qwen_2_5_vl train \ ### Python +For greater control, use the Python API to fine-tune your models. +Import the train function from the corresponding module and define your configuration +in a dictionary. The core modules take care of reproducibility, data preparation, +and training setup. + ```python -from maestro.trainer.models.qwen_2_5_vl.core import train +from maestro.trainer.models.paligemma_2.core import train config = { "dataset": "dataset/location", diff --git a/docs/assets/maestro-logo.svg b/docs/assets/maestro-logo.svg new file mode 100644 index 0000000..4273f92 --- /dev/null +++ b/docs/assets/maestro-logo.svg @@ -0,0 +1,2 @@ + + diff --git a/docs/index.md b/docs/index.md index e66b525..db2fb5f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,50 +2,161 @@

maestro

-

coming: when it's ready...

+
+ +
+ + + + +
-**maestro** is a tool designed to streamline and accelerate the fine-tuning process for -multimodal models. It provides ready-to-use recipes for fine-tuning popular -vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and -**Qwen2-VL** on downstream vision-language tasks. +## Hello -## install +**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models. +By encapsulating best practices from our core modules, maestro handles configuration, +data loading, reproducibility, and training loop setup. It currently offers ready-to-use +recipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and +**Qwen2.5-VL**. -Pip install the supervision package in a -[**Python>=3.8**](https://www.python.org/) environment. +## Quickstart -```bash -pip install maestro -``` +### Install -## quickstart +To begin, install the model-specific dependencies. Since some models may have clashing requirements, +we recommend creating a dedicated Python environment for each model. -### CLI +=== "Florence-2" + + ```bash + pip install maestro[florence_2] + ``` -VLMs can be fine-tuned on downstream tasks directly from the command line with -`maestro` command: +=== "PaliGemma 2" -```bash -maestro florence2 train --dataset='' --epochs=10 --batch-size=8 -``` + ```bash + pip install maestro[paligemma_2] + ``` -### SDK +=== "Qwen2.5-VL" -Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same -arguments as the CLI example above: + ```bash + pip install maestro[qwen_2_5_vl] + pip install git+https://github.com/huggingface/transformers + ``` -```python -from maestro.trainer.common import MeanAveragePrecisionMetric -from maestro.trainer.models.florence_2 import train, Configuration + !!! warning + Support for Qwen2.5-VL in transformers is experimental. + For now, please install transformers from source to ensure compatibility. -config = Configuration( - dataset='', - epochs=10, - batch_size=8, - metrics=[MeanAveragePrecisionMetric()] -) +### CLI -train(config) -``` +Kick off fine-tuning with our command-line interface, which leverages the configuration +and training routines defined in each model’s core module. Simply specify key parameters such as +the dataset location, number of epochs, batch size, optimization strategy, and metrics. + +=== "Florence-2" + + ```bash + maestro florence_2 train \ + --dataset "dataset/location" \ + --epochs 10 \ + --batch-size 4 \ + --optimization_strategy "qlora" \ + --metrics "edit_distance" + ``` + +=== "PaliGemma 2" + + ```bash + maestro paligemma_2 train \ + --dataset "dataset/location" \ + --epochs 10 \ + --batch-size 4 \ + --optimization_strategy "qlora" \ + --metrics "edit_distance" + ``` + +=== "Qwen2.5-VL" + + ```bash + maestro qwen_2_5_vl train \ + --dataset "dataset/location" \ + --epochs 10 \ + --batch-size 4 \ + --optimization_strategy "qlora" \ + --metrics "edit_distance" + ``` + +### Python + +For greater control, use the Python API to fine-tune your models. +Import the train function from the corresponding module and define your configuration +in a dictionary. The core modules take care of reproducibility, data preparation, +and training setup. + +=== "Florence-2" + + ```python + from maestro.trainer.models.florence_2.core import train + + config = { + "dataset": "dataset/location", + "epochs": 10, + "batch_size": 4, + "optimization_strategy": "qlora", + "metrics": ["edit_distance"], + } + + train(config) + ``` + +=== "PaliGemma 2" + + ```python + from maestro.trainer.models.paligemma_2.core import train + + config = { + "dataset": "dataset/location", + "epochs": 10, + "batch_size": 4, + "optimization_strategy": "qlora", + "metrics": ["edit_distance"], + } + + train(config) + ``` + +=== "Qwen2.5-VL" + + ```python + from maestro.trainer.models.qwen_2_5_vl.core import train + + config = { + "dataset": "dataset/location", + "epochs": 10, + "batch_size": 4, + "optimization_strategy": "qlora", + "metrics": ["edit_distance"], + } + + train(config) + ``` diff --git a/mkdocs.yaml b/mkdocs.yaml index fb0acb4..fefc77b 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -1,11 +1,11 @@ site_name: maestro -site_url: https://roboflow.github.io/multimodal-maestro/ +site_url: https://roboflow.github.io/maestro/ site_author: Roboflow site_description: 'Streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Qwen2-VL.' -repo_name: roboflow/multimodal-maestro -repo_url: https://github.com/roboflow/multimodal-maestro -edit_uri: https://github.com/roboflow/multimodal-maestro/tree/main/docs -copyright: Roboflow 2024. All rights reserved. +repo_name: roboflow/maestro +repo_url: https://github.com/roboflow/maestro +edit_uri: https://github.com/roboflow/maestro/tree/main/docs +copyright: Roboflow 2025. All rights reserved. extra: social: @@ -23,34 +23,38 @@ extra: nav: - Maestro: index.md - - Models: - - Florence-2: florence-2.md - - Tasks: tasks.md - - Metrics: metrics.md +# - Models: +# - Florence-2: florence-2.md +# - Tasks: tasks.md +# - Metrics: metrics.md theme: name: 'material' - logo: https://media.roboflow.com/open-source/supervision/supervision-lenny.png - favicon: https://media.roboflow.com/open-source/supervision/supervision-lenny.png + logo: assets/maestro-logo.svg + favicon: assets/maestro-logo.svg custom_dir: docs/theme palette: # Palette for light mode - scheme: default - primary: 'custom' + primary: 'black' toggle: icon: material/brightness-7 name: Switch to dark mode # Palette toggle for dark mode - scheme: slate - primary: 'custom' + primary: 'black' toggle: icon: material/brightness-4 name: Switch to light mode font: text: Roboto code: Roboto Mono + features: + - content.tabs.link + - content.code.copy + plugins: - search diff --git a/pyproject.toml b/pyproject.toml index 84cba54..26acc96 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,5 +1,5 @@ [build-system] -requires = ["setuptools", "setuptools-scm", "wheel"] +requires = ["setuptools", "wheel"] build-backend = "setuptools.build_meta" [project] @@ -78,7 +78,8 @@ florence_2 = [ paligemma_2 = [ "peft>=0.12", "torch>=2.4.0", - "transformers<4.48.0", # does not work with 4.49.* + # PaliGemma 2 training does not work with 4.49.* + "transformers<4.48.0", "bitsandbytes>=0.45.0" ] qwen_2_5_vl = [ @@ -86,7 +87,8 @@ qwen_2_5_vl = [ "peft>=0.12", "torch>=2.4.0", "torchvision>=0.20.0", - "transformers @ git+https://github.com/huggingface/transformers", + # PyPi doesn't allow git repo packages; uncomment when transformers release support for Qwen2.5-VL + # "transformers @ git+https://github.com/huggingface/transformers", "bitsandbytes>=0.45.0", "qwen-vl-utils>=0.0.8" ]