outerbounds
diff --git a/‎.github/workflows/deploy.yml
Lines changed: 64 additions & 0 deletions b/‎.github/workflows/deploy.yml
Lines changed: 64 additions & 0 deletions
diff --git a/‎.python-version
Lines changed: 1 addition & 0 deletions b/‎.python-version
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md
Lines changed: 142 additions & 0 deletions b/‎README.md
Lines changed: 142 additions & 0 deletions
diff --git a/‎deployments/optuna-dashboard/config.yml
Lines changed: 19 additions & 0 deletions b/‎deployments/optuna-dashboard/config.yml
Lines changed: 19 additions & 0 deletions
diff --git a/‎deployments/optuna-dashboard/main.py
Lines changed: 24 additions & 0 deletions b/‎deployments/optuna-dashboard/main.py
Lines changed: 24 additions & 0 deletions
diff --git a/‎flows/nn/config.json
Lines changed: 27 additions & 0 deletions b/‎flows/nn/config.json
Lines changed: 27 additions & 0 deletions
@@ -0,0 +1,64 @@
+name: Deploy Project
+on:
+  push:
+    branches:
+    - main
+  pull_request:
+    branches:
+    - main
+
+env:
+  GH_HEAD_REF: ${{ github.head_ref }}
+  GH_REF: ${{ github.ref_name }}
+
+permissions:
+  id-token: write
+  contents: read
+  pull-requests: write
+
+jobs:
+  deploy:
+    name: Deploy Project
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        ref: ${{ github.event.pull_request.head.sha }}
+        fetch-depth: 0
+
+    - name: Set up Python
+      uses: actions/setup-python@v1
+      with:
+        python-version: 3.12
+
+    - name: Install dependencies
+      run: |
+        python3 -m pip install -U requests
+        python3 -m pip install outerbounds pyyaml
+        python3 -m pip install -U ob-project-utils
+    - name: Configure Outerbounds
+      run: |
+        PROJECT_NAME=$(yq .project obproject.toml)
+        DEFAULT_CICD_USER="${PROJECT_NAME//_/-}-cicd"
+        PLATFORM=$(yq .platform obproject.toml)
+        CICD_USER=$(yq ".cicd_user // \"$DEFAULT_CICD_USER\"")
+        PERIMETER="default"
+        echo "🏗️ Deployment target:"
+        echo "  Platform: $PLATFORM"
+        echo "  CI/CD User: $CICD_USER"
+        echo "  Perimeter: $PERIMETER"
+        outerbounds service-principal-configure \
+          --name $CICD_USER \
+          --deployment-domain $PLATFORM \
+          --perimeter $PERIMETER \
+          --github-actions
+
+    - name: Deploy Project
+      env:
+        COMMIT_URL: "https://github.com/${{ github.repository }}/commit/"
+        CI_URL: "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+        GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
+        PYTHONUNBUFFERED: 1
+      run: obproject-deploy
@@ -0,0 +1 @@
+3.13
@@ -0,0 +1,142 @@
+## Hyperparameter Optimization Project
+
+This repository shows you how to run a hyperparameter optimization (HPO) system as an Outerbounds project.
+This `README.md` will explain why you'd want to connect these concepts, and will show you how to launch HPO jobs for:
+- classical ML models
+- deep learning models
+- end-to-end system tuning
+
+If you have never deployed an Outerbounds project, please read [the documentation page](/outerbounds/project-setup/) before continuing.
+
+### Local/workstation dependencies
+
+[Install uv](https://docs.astral.sh/uv/getting-started/installation/).
+
+From your laptop or Outerbounds workstation run:
+```bash
+uv sync
+```
+
+Configure Outerbounds token. Ask in Slack if not sure.
+
+### Optuna integration
+This system is an integration between [Optuna](https://optuna.org/), a feature-rich and open-source hyperparameter optimziation framework, and Outerbounds. Using it leverages functionality built-into your Outerbounds deployment to run a persistent relational database that tasks and applications can communicate with. The Optuna dashboard is run as an Outerbounds app, enabling sophisticated analysis of hyperparameter tuning runs.  
+
+### How to use this repository
+
+#### Deploy the Optuna dashboard application
+
+The Outerbounds app that will run your Optuna dashboard is defined in [`./deployments/optuna-dashboard/config.yml`](./deployments/optuna-dashboard/config.yml).
+When you push to the main branch of this repository, the `obproject-deployer` will create the application in your Outerbounds project branch.
+If you'd like to manually deploy the application:
+
+```bash
+cd deployments/optuna-dashboard
+uv run outerbounds app deploy --config-file config.yml
+```
+
+#### Run a workflow
+
+There are two demos implemented within this project base in `flows/tree-model` and `flows/nn`.
+Each workflow template defines:
+- a `flow.py` containing a `FlowSpec`, 
+- a single `config.json` to set system variables and hyperparameter configurations,
+- an `hpo_client.py` containing entrypoints to run and trigger the flow, 
+- notebooks showing how to run and analyze results of hyperparameter tuning runs, and
+- the templates show how to define a modular, fully customizable objective function.
+
+For the rest of this section, we'll use the `flows/nn` template, as everything else is the sames as for `flows/tree-model`.
+
+```bash
+cd flows/nn
+```
+
+##### Setting configs
+Before running or deploying the workflows, investigate the relationship between the flow and the `config.json` file.
+
+Based on the compute pools available in your Outerbounds deployment, set the `compute_pool` variable. 
+If you are new to compute pools, please visit the documentation or consult your Outerbounds admins/Slack for guidance.
+
+As long as you haven't changed anything when deploying the application hosting the Optuna dashboard, you do not need to change anything besides the `compute_pool` in that file, 
+but it is useful to be familiar with these contents and the way the configuration files are interacting with Metaflow code. 
+
+##### Regular Metaflow usage
+To run the flow directly (e.g., standard Metaflow user experience):
+
+```bash
+python flow.py --environment=fast-bakery run --with kubernetes
+python flow.py --environment=fast-bakery argo-workflows create/trigger
+```
+
+##### Using the HPO client
+These examples also include a convenience wrapper around the workflows in the `hpo_client.py`. 
+The purpose is to make the flows easier to use and the abstractions more in line with typical HPO interfaces seen in the wild. 
+
+```bash
+cd flows/nn
+```
+
+There are three client modes:
+1. Blocking - `python hpo_client.py -m 1`
+2. Async - `python hpo_client.py -m 2`
+3. Trigger - `python hpo_client.py -m 3` 
+    - Trigger option also works with a parameter `--namespace/-n`, which determines the namespace within which this code path checks for already-deployed flows.
+
+### Optuna 101 
+
+This implementation wraps the standard Optuna interface, aiming to balance two goals:
+1. Provide full expressiveness and compatability with open-source Optuna features.
+2. Provide an opinionated and streamlined interface for launching HPO studies as Metaflow flows. 
+
+#### The objective function
+Typically, Optuna programs are developed in Python scripts. 
+An objective function returns 1 or 2 values. 
+It's argument is a [`trial`](https://optuna.readthedocs.io/en/stable/reference/trial.html), 
+representing a single execution of the objective function; in other words, a sample drawn from the hyperparameter search space.
+
+```python
+def objective(trial):
+    x = trial.suggest_float("x", -100, 100)
+    y = trial.suggest_categorical("y", [-1, 0, 1])
+    f1 = x**2 + y
+    f2 = -((x - 2) ** 2 + y)
+    return f1, f2
+```
+
+The key task of the user who wishes to use the `from outerbounds.hpo import HPORunner` abstraction this project affords is to determine:
+1. How to define the objective function? 
+2. What data, model, and code does the objective function depend on?
+3. How many trials do you want to run per study?
+
+With answers to these questions, you'll be ready to adapt your objective functions as demonstrated in the example [`flows/`](./flows/) and [`notebooks/`](./notebooks/) and call the `HPORunner` interface to automate HPO workflows.
+
+#### Note on search spaces
+Notice that with Optuna, the user imperatively defines the hyperparameter space in how the `trial` object is used within the `objective` function.
+The number of variables for which we have `trial.suggest_*` defines the dimensionality of the search space. 
+Be judicious with adding parameters. Many algorithms, especially bayesian optimization suffers performance degradation when there are many more than 5-10 parameters being tuned simultaneously.
+
+[Read more](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#configurations).
+
+#### Studies, samplers, and pruners
+To optimize the hyperparameters, we create a study.
+Optuna implements many optimization algorithm families, called as [`optuna.samplers`](https://optuna.readthedocs.io/en/stable/reference/samplers/index.html). These include grid, random, tree-structure parzen estimators, evolutionary (CMA-ES, NSGA-II), Gaussian processes, Quasi Monte Carlo methods, and more.
+
+For example, if you wanted to purely random sample - no learning throughout the study - the hyperparameter space 10 times, you'd run:
+```python
+study = optuna.create_study(sampler=optuna.samplers.RandomSampler())   
+study.optimize(objective, n_trials=10)
+```
+
+Sometimes it is desirable to early stop unpromising trials. The mechanism for doing this in Optuna is called as [`optuna.pruners`](https://optuna.readthedocs.io/en/stable/reference/pruners.html), which uses intermediate objective function state varaibles of previous trials to determine a boolean representing whether the trial should be pruned.
+
+#### Resuming studies
+To resume a study, simply pass in the name of the previous study. 
+If leveraging the Metaflow versioning scheme which uses the Metaflow Run pathspec as the study name - in other words not overriding the study name via configs or CLI - then
+you can set this value in the config and resume the study. You can also override in the command line using the `hpo_client`'s `--resume-study/-r` option:
+
+```bash
+python hpo_client.py -m 1 -r TreeModelHpoFlow/argo-hposystem.prod.treemodelhpoflow-7ntvz
+```
+
+## TODO
+- Benchmark gRPC vs. pure RDB scaling thresholds. When is it worth it to do gRPC? How hard is that to implement? How do costs scale in each mode? 
@@ -0,0 +1,19 @@
+name: hpo-dashboard
+port: 8088
+
+commands:
+  - gunicorn --workers 2 --bind 0.0.0.0:8088 main:app
+
+dependencies:
+  pypi:
+    optuna-dashboard: ""
+    psycopg2-binary: ""
+    gunicorn: ""
+    werkzeug: ""
+
+resources:
+  cpu: "2"
+  memory: "4Gi"
+  ephemeralStorage: "10Gi"
+
+persistence: postgres
@@ -0,0 +1,24 @@
+from optuna_dashboard import wsgi
+from optuna.storages import RDBStorage
+from werkzeug.middleware.proxy_fix import ProxyFix
+import os
+import json
+
+
+def get_mf_token():
+    with open(os.path.join(os.environ["METAFLOW_HOME"], "config.json"), "r") as f:
+        conf = json.loads(f.read())
+        return conf["METAFLOW_SERVICE_AUTH_KEY"]
+
+
+def generate_db_url():
+    # This function mirrors the metaflow.plugins.optuna.get_db_url function used in the /flows.
+    # FIXME: Reuse/consolidate existing function.
+    mf_token = get_mf_token()
+    return f"postgresql://userspace_default:{mf_token}@localhost:5432/userspace_default?sslmode=disable"
+
+
+STORAGE_URL = generate_db_url()
+
+base_app = wsgi(RDBStorage(STORAGE_URL))
+app = ProxyFix(base_app, x_for=1, x_proto=1, x_host=1, x_port=1)
@@ -0,0 +1,27 @@
+{
+    "compute_pool": "obp-main",
+    "n_trials": 15,
+    "trials_per_task": 1,
+
+    "directions": [
+        "minimize",
+        "maximize"
+    ],
+    
+    "optuna_app_name": "hpo-dashboard",
+    "environment_builder": "fast-bakery",
+    "flow_file": "flow.py",
+    "objective_function_file": "objective_fn.py",
+
+    "environment": {
+        "python": "3.12",
+        "packages": {
+            "optuna": "4.5.0",
+            "psycopg2-binary": "2.9.10",
+            "torch": "2.5.1",
+            "torchvision": "0.20.1",
+            "pandas": "2.3.2",
+            "scipy": "1.16.1"
+        }
+    }
+}