This repository contains the official implementation of our AAAI 2026 paper: Offline Multi-Objective Bandits: From Logged Data to Pareto-Optimal Policies
synthetic_main.py— main experiment runner.data/— synthetic datasets and generator (data/synthetic_data.py).algorithms/— algorithm implementations (OffMOB, MOLinLCB, MOKernLCB, neural baselines, ...).core/— experiment utilities, runner, metrics, plotting.results/— saved .npz experiment outputs (created by the runner).
- Python 3.8+
- Recommended packages: numpy, scipy, torch, tqdm, easydict, absl-py, matplotlib
Run a small OffMOB experiment:
python synthetic_main.py --algo_group=offmob --num_steps=100 --num_sim=2 --data_type=quadratic --results_dir=resultsImportant flags (see top of synthetic_main.py):
--algo_group:all/offmob/baseline/ablation--data_type:quadratic/quadratic2/cosine--noise_variance,--num_steps,--num_sim,--num_actions,--context_dim,--num_objectives--results_dir: output folder for .npz files
Runners save .npz files (e.g. *_final.npz) containing:
results: dict mapping algorithm name → metrics arrays (tchebycheff gaps, accuracies, rewards)test_points: list of evaluation steps
- Seeds and hyperparams live in the runner;
create_hparamsconfigures defaults. - To add an algorithm implement
sample_action,update,resetin the algorithm class. - Save backups before overwriting large .npz result files.
If you use this code, please cite the paper: