@@ -44,10 +35,11 @@
## Hello
-**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
-multimodal models. It provides ready-to-use recipes for fine-tuning popular
-vision-language models (VLMs) such as **Florence-2**, **PaliGemma 2**, and
-**Qwen2.5-VL** on downstream vision-language tasks.
+**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models.
+By encapsulating best practices from our core modules, maestro handles configuration,
+data loading, reproducibility, and training loop setup. It currently offers ready-to-use
+recipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and
+**Qwen2.5-VL**.

@@ -55,18 +47,21 @@ vision-language models (VLMs) such as **Florence-2**, **PaliGemma 2**, and
### Install
-To get started with maestro, you’ll need to install the dependencies specific to the model you wish to fine-tune.
+To begin, install the model-specific dependencies. Since some models may have clashing requirements,
+we recommend creating a dedicated Python environment for each model.
```bash
-pip install maestro[qwen_2_5_vl]
+pip install maestro[paligemma_2]
```
-**Note:** Some models may have clashing dependencies. We recommend creating a separate python environment for each model to avoid version conflicts.
-
### CLI
+Kick off fine-tuning with our command-line interface, which leverages the configuration
+and training routines defined in each model’s core module. Simply specify key parameters such as
+the dataset location, number of epochs, batch size, optimization strategy, and metrics.
+
```bash
-maestro qwen_2_5_vl train \
+maestro paligemma_2 train \
--dataset "dataset/location" \
--epochs 10 \
--batch-size 4 \
@@ -76,8 +71,13 @@ maestro qwen_2_5_vl train \
### Python
+For greater control, use the Python API to fine-tune your models.
+Import the train function from the corresponding module and define your configuration
+in a dictionary. The core modules take care of reproducibility, data preparation,
+and training setup.
+
```python
-from maestro.trainer.models.qwen_2_5_vl.core import train
+from maestro.trainer.models.paligemma_2.core import train
config = {
"dataset": "dataset/location",
diff --git a/docs/assets/maestro-logo.svg b/docs/assets/maestro-logo.svg
new file mode 100644
index 0000000..4273f92
--- /dev/null
+++ b/docs/assets/maestro-logo.svg
@@ -0,0 +1,2 @@
+
+
diff --git a/docs/index.md b/docs/index.md
index e66b525..db2fb5f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -2,50 +2,161 @@
maestro
-
coming: when it's ready...
+
+
+
+
+
+
+
+
-**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
-multimodal models. It provides ready-to-use recipes for fine-tuning popular
-vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and
-**Qwen2-VL** on downstream vision-language tasks.
+## Hello
-## install
+**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models.
+By encapsulating best practices from our core modules, maestro handles configuration,
+data loading, reproducibility, and training loop setup. It currently offers ready-to-use
+recipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and
+**Qwen2.5-VL**.
-Pip install the supervision package in a
-[**Python>=3.8**](https://www.python.org/) environment.
+## Quickstart
-```bash
-pip install maestro
-```
+### Install
-## quickstart
+To begin, install the model-specific dependencies. Since some models may have clashing requirements,
+we recommend creating a dedicated Python environment for each model.
-### CLI
+=== "Florence-2"
+
+ ```bash
+ pip install maestro[florence_2]
+ ```
-VLMs can be fine-tuned on downstream tasks directly from the command line with
-`maestro` command:
+=== "PaliGemma 2"
-```bash
-maestro florence2 train --dataset='' --epochs=10 --batch-size=8
-```
+ ```bash
+ pip install maestro[paligemma_2]
+ ```
-### SDK
+=== "Qwen2.5-VL"
-Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same
-arguments as the CLI example above:
+ ```bash
+ pip install maestro[qwen_2_5_vl]
+ pip install git+https://github.com/huggingface/transformers
+ ```
-```python
-from maestro.trainer.common import MeanAveragePrecisionMetric
-from maestro.trainer.models.florence_2 import train, Configuration
+ !!! warning
+ Support for Qwen2.5-VL in transformers is experimental.
+ For now, please install transformers from source to ensure compatibility.
-config = Configuration(
- dataset='',
- epochs=10,
- batch_size=8,
- metrics=[MeanAveragePrecisionMetric()]
-)
+### CLI
-train(config)
-```
+Kick off fine-tuning with our command-line interface, which leverages the configuration
+and training routines defined in each model’s core module. Simply specify key parameters such as
+the dataset location, number of epochs, batch size, optimization strategy, and metrics.
+
+=== "Florence-2"
+
+ ```bash
+ maestro florence_2 train \
+ --dataset "dataset/location" \
+ --epochs 10 \
+ --batch-size 4 \
+ --optimization_strategy "qlora" \
+ --metrics "edit_distance"
+ ```
+
+=== "PaliGemma 2"
+
+ ```bash
+ maestro paligemma_2 train \
+ --dataset "dataset/location" \
+ --epochs 10 \
+ --batch-size 4 \
+ --optimization_strategy "qlora" \
+ --metrics "edit_distance"
+ ```
+
+=== "Qwen2.5-VL"
+
+ ```bash
+ maestro qwen_2_5_vl train \
+ --dataset "dataset/location" \
+ --epochs 10 \
+ --batch-size 4 \
+ --optimization_strategy "qlora" \
+ --metrics "edit_distance"
+ ```
+
+### Python
+
+For greater control, use the Python API to fine-tune your models.
+Import the train function from the corresponding module and define your configuration
+in a dictionary. The core modules take care of reproducibility, data preparation,
+and training setup.
+
+=== "Florence-2"
+
+ ```python
+ from maestro.trainer.models.florence_2.core import train
+
+ config = {
+ "dataset": "dataset/location",
+ "epochs": 10,
+ "batch_size": 4,
+ "optimization_strategy": "qlora",
+ "metrics": ["edit_distance"],
+ }
+
+ train(config)
+ ```
+
+=== "PaliGemma 2"
+
+ ```python
+ from maestro.trainer.models.paligemma_2.core import train
+
+ config = {
+ "dataset": "dataset/location",
+ "epochs": 10,
+ "batch_size": 4,
+ "optimization_strategy": "qlora",
+ "metrics": ["edit_distance"],
+ }
+
+ train(config)
+ ```
+
+=== "Qwen2.5-VL"
+
+ ```python
+ from maestro.trainer.models.qwen_2_5_vl.core import train
+
+ config = {
+ "dataset": "dataset/location",
+ "epochs": 10,
+ "batch_size": 4,
+ "optimization_strategy": "qlora",
+ "metrics": ["edit_distance"],
+ }
+
+ train(config)
+ ```
diff --git a/mkdocs.yaml b/mkdocs.yaml
index fb0acb4..fefc77b 100644
--- a/mkdocs.yaml
+++ b/mkdocs.yaml
@@ -1,11 +1,11 @@
site_name: maestro
-site_url: https://roboflow.github.io/multimodal-maestro/
+site_url: https://roboflow.github.io/maestro/
site_author: Roboflow
site_description: 'Streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Qwen2-VL.'
-repo_name: roboflow/multimodal-maestro
-repo_url: https://github.com/roboflow/multimodal-maestro
-edit_uri: https://github.com/roboflow/multimodal-maestro/tree/main/docs
-copyright: Roboflow 2024. All rights reserved.
+repo_name: roboflow/maestro
+repo_url: https://github.com/roboflow/maestro
+edit_uri: https://github.com/roboflow/maestro/tree/main/docs
+copyright: Roboflow 2025. All rights reserved.
extra:
social:
@@ -23,34 +23,38 @@ extra:
nav:
- Maestro: index.md
- - Models:
- - Florence-2: florence-2.md
- - Tasks: tasks.md
- - Metrics: metrics.md
+# - Models:
+# - Florence-2: florence-2.md
+# - Tasks: tasks.md
+# - Metrics: metrics.md
theme:
name: 'material'
- logo: https://media.roboflow.com/open-source/supervision/supervision-lenny.png
- favicon: https://media.roboflow.com/open-source/supervision/supervision-lenny.png
+ logo: assets/maestro-logo.svg
+ favicon: assets/maestro-logo.svg
custom_dir: docs/theme
palette:
# Palette for light mode
- scheme: default
- primary: 'custom'
+ primary: 'black'
toggle:
icon: material/brightness-7
name: Switch to dark mode
# Palette toggle for dark mode
- scheme: slate
- primary: 'custom'
+ primary: 'black'
toggle:
icon: material/brightness-4
name: Switch to light mode
font:
text: Roboto
code: Roboto Mono
+ features:
+ - content.tabs.link
+ - content.code.copy
+
plugins:
- search
diff --git a/pyproject.toml b/pyproject.toml
index 84cba54..26acc96 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,5 +1,5 @@
[build-system]
-requires = ["setuptools", "setuptools-scm", "wheel"]
+requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
@@ -78,7 +78,8 @@ florence_2 = [
paligemma_2 = [
"peft>=0.12",
"torch>=2.4.0",
- "transformers<4.48.0", # does not work with 4.49.*
+ # PaliGemma 2 training does not work with 4.49.*
+ "transformers<4.48.0",
"bitsandbytes>=0.45.0"
]
qwen_2_5_vl = [
@@ -86,7 +87,8 @@ qwen_2_5_vl = [
"peft>=0.12",
"torch>=2.4.0",
"torchvision>=0.20.0",
- "transformers @ git+https://github.com/huggingface/transformers",
+ # PyPi doesn't allow git repo packages; uncomment when transformers release support for Qwen2.5-VL
+ # "transformers @ git+https://github.com/huggingface/transformers",
"bitsandbytes>=0.45.0",
"qwen-vl-utils>=0.0.8"
]