Merge pull request #127 from roboflow/fix/packaging

packaging: 📦 update transformers dependency for Qwen2.5-VL and documentation updates
roboflow · Feb 4, 2025 · c983d22 · c983d22
2 parents 4ed36d8 + 4af961e
commit c983d22
Show file tree

Hide file tree

Showing 5 changed files with 186 additions and 67 deletions.
diff --git a/README.md b/README.md
@@ -1,39 +1,30 @@
-
 <div align="center">
 
   <h1>maestro</h1>
 
   <br>
 
   <div>
-    <a href="https://example1.com" style="margin: 0 10px;">
       <img
         src="https://github.com/user-attachments/assets/c9416f1f-a2bf-4590-86da-d2fc89ba559b"
         width="80"
         height="40"
       />
-    </a>
-    <a href="https://example2.com" style="margin: 0 10px;">
       <img
         src="https://github.com/user-attachments/assets/75dc7214-e82a-498d-950e-c64d90218e49"
         width="80"
         height="40"
       />
-    </a>
-    <a href="https://example3.com" style="margin: 0 10px;">
       <img
         src="https://github.com/user-attachments/assets/5d265473-b938-4501-b894-6a44a6a28a8c"
         width="80"
         height="40"
       />
-    </a>
-    <a href="https://example3.com" style="margin: 0 10px;">
       <img
         src="https://github.com/user-attachments/assets/b7ccdf39-ac77-4dbd-8608-0fa2d9dadf0a"
         width="80"
         height="40"
       />
-    </a>
   </div>
 
   <br>
@@ -44,29 +35,33 @@
 
 ## Hello
 
-**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
-multimodal models. It provides ready-to-use recipes for fine-tuning popular
-vision-language models (VLMs) such as **Florence-2**, **PaliGemma 2**, and
-**Qwen2.5-VL** on downstream vision-language tasks.
+**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models.
+By encapsulating best practices from our core modules, maestro handles configuration,
+data loading, reproducibility, and training loop setup. It currently offers ready-to-use
+recipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and
+**Qwen2.5-VL**.
 
 ![maestro](https://github.com/user-attachments/assets/3bb9ccba-b0ee-4964-bcd6-f71124a08bc2)
 
 ## Quickstart
 
 ### Install
 
-To get started with maestro, you’ll need to install the dependencies specific to the model you wish to fine-tune.
+To begin, install the model-specific dependencies. Since some models may have clashing requirements,
+we recommend creating a dedicated Python environment for each model.
 
 ```bash
-pip install maestro[qwen_2_5_vl]
+pip install maestro[paligemma_2]
 ```
 
-**Note:** Some models may have clashing dependencies. We recommend creating a separate python environment for each model to avoid version conflicts.
-
 ### CLI
 
+Kick off fine-tuning with our command-line interface, which leverages the configuration
+and training routines defined in each model’s core module. Simply specify key parameters such as
+the dataset location, number of epochs, batch size, optimization strategy, and metrics.
+
 ```bash
-maestro qwen_2_5_vl train \
+maestro paligemma_2 train \
   --dataset "dataset/location" \
   --epochs 10 \
   --batch-size 4 \
@@ -76,8 +71,13 @@ maestro qwen_2_5_vl train \
 
 ### Python
 
+For greater control, use the Python API to fine-tune your models.
+Import the train function from the corresponding module and define your configuration
+in a dictionary. The core modules take care of reproducibility, data preparation,
+and training setup.
+
 ```python
-from maestro.trainer.models.qwen_2_5_vl.core import train
+from maestro.trainer.models.paligemma_2.core import train
 
 config = {
     "dataset": "dataset/location",

diff --git a/docs/assets/maestro-logo.svg b/docs/assets/maestro-logo.svg
diff --git a/docs/index.md b/docs/index.md
@@ -2,50 +2,161 @@
 
   <h1>maestro</h1>
 
-  <p>coming: when it's ready...</p>
+  <br>
+
+  <div>
+      <img
+        src="https://github.com/user-attachments/assets/c9416f1f-a2bf-4590-86da-d2fc89ba559b"
+        width="80"
+        height="40"
+      />
+      <img
+        src="https://github.com/user-attachments/assets/75dc7214-e82a-498d-950e-c64d90218e49"
+        width="80"
+        height="40"
+      />
+      <img
+        src="https://github.com/user-attachments/assets/5d265473-b938-4501-b894-6a44a6a28a8c"
+        width="80"
+        height="40"
+      />
+      <img
+        src="https://github.com/user-attachments/assets/b7ccdf39-ac77-4dbd-8608-0fa2d9dadf0a"
+        width="80"
+        height="40"
+      />
+  </div>
 
 </div>
 
-**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
-multimodal models. It provides ready-to-use recipes for fine-tuning popular
-vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and
-**Qwen2-VL** on downstream vision-language tasks.
+## Hello
 
-## install
+**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models.
+By encapsulating best practices from our core modules, maestro handles configuration,
+data loading, reproducibility, and training loop setup. It currently offers ready-to-use
+recipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and
+**Qwen2.5-VL**.
 
-Pip install the supervision package in a
-[**Python>=3.8**](https://www.python.org/) environment.
+## Quickstart
 
-```bash
-pip install maestro
-```
+### Install
 
-## quickstart
+To begin, install the model-specific dependencies. Since some models may have clashing requirements,
+we recommend creating a dedicated Python environment for each model.
 
-### CLI
+=== "Florence-2"
+
+    ```bash
+    pip install maestro[florence_2]
+    ```
 
-VLMs can be fine-tuned on downstream tasks directly from the command line with
-`maestro` command:
+=== "PaliGemma 2"
 
-```bash
-maestro florence2 train --dataset='<DATASET_PATH>' --epochs=10 --batch-size=8
-```
+    ```bash
+    pip install maestro[paligemma_2]
+    ```
 
-### SDK
+=== "Qwen2.5-VL"
 
-Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same
-arguments as the CLI example above:
+    ```bash
+    pip install maestro[qwen_2_5_vl]
+    pip install git+https://github.com/huggingface/transformers
+    ```
 
-```python
-from maestro.trainer.common import MeanAveragePrecisionMetric
-from maestro.trainer.models.florence_2 import train, Configuration
+    !!! warning
+        Support for Qwen2.5-VL in transformers is experimental.
+        For now, please install transformers from source to ensure compatibility.
 
-config = Configuration(
-    dataset='<DATASET_PATH>',
-    epochs=10,
-    batch_size=8,
-    metrics=[MeanAveragePrecisionMetric()]
-)
+### CLI
 
-train(config)
-```
+Kick off fine-tuning with our command-line interface, which leverages the configuration
+and training routines defined in each model’s core module. Simply specify key parameters such as
+the dataset location, number of epochs, batch size, optimization strategy, and metrics.
+
+=== "Florence-2"
+
+    ```bash
+    maestro florence_2 train \
+      --dataset "dataset/location" \
+      --epochs 10 \
+      --batch-size 4 \
+      --optimization_strategy "qlora" \
+      --metrics "edit_distance"
+    ```
+
+=== "PaliGemma 2"
+
+    ```bash
+    maestro paligemma_2 train \
+      --dataset "dataset/location" \
+      --epochs 10 \
+      --batch-size 4 \
+      --optimization_strategy "qlora" \
+      --metrics "edit_distance"
+    ```
+
+=== "Qwen2.5-VL"
+
+    ```bash
+    maestro qwen_2_5_vl train \
+      --dataset "dataset/location" \
+      --epochs 10 \
+      --batch-size 4 \
+      --optimization_strategy "qlora" \
+      --metrics "edit_distance"
+    ```
+
+### Python
+
+For greater control, use the Python API to fine-tune your models.
+Import the train function from the corresponding module and define your configuration
+in a dictionary. The core modules take care of reproducibility, data preparation,
+and training setup.
+
+=== "Florence-2"
+
+    ```python
+    from maestro.trainer.models.florence_2.core import train
+
+    config = {
+        "dataset": "dataset/location",
+        "epochs": 10,
+        "batch_size": 4,
+        "optimization_strategy": "qlora",
+        "metrics": ["edit_distance"],
+    }
+
+    train(config)
+    ```
+
+=== "PaliGemma 2"
+
+    ```python
+    from maestro.trainer.models.paligemma_2.core import train
+
+    config = {
+        "dataset": "dataset/location",
+        "epochs": 10,
+        "batch_size": 4,
+        "optimization_strategy": "qlora",
+        "metrics": ["edit_distance"],
+    }
+
+    train(config)
+    ```
+
+=== "Qwen2.5-VL"
+
+    ```python
+    from maestro.trainer.models.qwen_2_5_vl.core import train
+
+    config = {
+        "dataset": "dataset/location",
+        "epochs": 10,
+        "batch_size": 4,
+        "optimization_strategy": "qlora",
+        "metrics": ["edit_distance"],
+    }
+
+    train(config)
+    ```
diff --git a/mkdocs.yaml b/mkdocs.yaml
@@ -1,11 +1,11 @@
 site_name: maestro
-site_url: https://roboflow.github.io/multimodal-maestro/
+site_url: https://roboflow.github.io/maestro/
 site_author: Roboflow
 site_description: 'Streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Qwen2-VL.'
-repo_name: roboflow/multimodal-maestro
-repo_url: https://github.com/roboflow/multimodal-maestro
-edit_uri: https://github.com/roboflow/multimodal-maestro/tree/main/docs
-copyright: Roboflow 2024. All rights reserved.
+repo_name: roboflow/maestro
+repo_url: https://github.com/roboflow/maestro
+edit_uri: https://github.com/roboflow/maestro/tree/main/docs
+copyright: Roboflow 2025. All rights reserved.
 
 extra:
   social:
@@ -23,34 +23,38 @@ extra:
 
 nav:
   - Maestro: index.md
-  - Models:
-    - Florence-2: florence-2.md
-  - Tasks: tasks.md
-  - Metrics: metrics.md
+#  - Models:
+#    - Florence-2: florence-2.md
+#  - Tasks: tasks.md
+#  - Metrics: metrics.md
 
 
 theme:
   name: 'material'
-  logo: https://media.roboflow.com/open-source/supervision/supervision-lenny.png
-  favicon: https://media.roboflow.com/open-source/supervision/supervision-lenny.png
+  logo: assets/maestro-logo.svg
+  favicon: assets/maestro-logo.svg
   custom_dir: docs/theme
   palette:
     # Palette for light mode
     - scheme: default
-      primary: 'custom'
+      primary: 'black'
       toggle:
         icon: material/brightness-7
         name: Switch to dark mode
 
     # Palette toggle for dark mode
     - scheme: slate
-      primary: 'custom'
+      primary: 'black'
       toggle:
         icon: material/brightness-4
         name: Switch to light mode
   font:
     text: Roboto
     code: Roboto Mono
+  features:
+    - content.tabs.link
+    - content.code.copy
+
 
 plugins:
   - search

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,5 +1,5 @@
 [build-system]
-requires = ["setuptools", "setuptools-scm", "wheel"]
+requires = ["setuptools", "wheel"]
 build-backend = "setuptools.build_meta"
 
 [project]
@@ -78,15 +78,17 @@ florence_2 = [
 paligemma_2 = [
     "peft>=0.12",
     "torch>=2.4.0",
-    "transformers<4.48.0", # does not work with 4.49.*
+    # PaliGemma 2 training does not work with 4.49.*
+    "transformers<4.48.0",
     "bitsandbytes>=0.45.0"
 ]
 qwen_2_5_vl = [
     "accelerate>=1.2.1",
     "peft>=0.12",
     "torch>=2.4.0",
     "torchvision>=0.20.0",
-    "transformers @ git+https://github.com/huggingface/transformers",
+    # PyPi doesn't allow git repo packages; uncomment when transformers release support for Qwen2.5-VL
+    # "transformers @ git+https://github.com/huggingface/transformers",
     "bitsandbytes>=0.45.0",
     "qwen-vl-utils>=0.0.8"
 ]