Commit e60bd2f
Implement ORB AWS EC2 Worker Adapter (#525)
* Implement ORB Worker Adapter
* Add submit_tasks example to documentation and CI skip list
- Include submit_tasks.py in examples readme and documentation.
- Implement skip_examples.txt for top-level examples in CI.
- Add submit_tasks.py to skip_examples.txt as it requires a running scheduler.
* move orb/ and ami/ to driver/
* remove superfluous checks in orb config
* move import
* add a check for self._orb before returning machines
* adjust security group rules
* move method
* rename no random worker ids to deterministic worker ids
* add comment
* run submit tasks in ci
* fix help text
* make _filter_data a static method
* refactor orb worker adapter polling to use constants
* flake8
* don't touch skip examples file
* bump minor version
* docs: add worker adapter tutorials and update ORB integration details
- Add comprehensive documentation for Native, Fixed Native, and ORB worker adapters.
- Update README.md with ORB integration and corrected command-line arguments.
- Enable autosectionlabel in Sphinx configuration.
- Update scaling and compatibility documentation to reflect recent changes.
* refactor: move orb and ami drivers to src/scaler/drivers
* Delete test.toml
Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com>
* Refactor ORB worker adapter to use ZMQ-based protocol
- Remove aiohttp dependency and RESTful API implementation.
- Rename ORBAdapter to ORBWorkerAdapter.
- Implement ZMQ DEALER connection to scheduler for commands and heartbeats.
- Update start/shutdown logic to return status codes consistent with the new protocol.
- Clean up configuration by removing now-unused WebConfig.
* fix type error
* fix type error
* Output ORB templates in snake_case
Replace camelcase_dict with direct use of asdict output, which already
produces snake_case keys matching the ORBTemplate dataclass field names.
* Simplify ORB driver config files
Remove unused sections (server, metrics, performance, events, naming,
circuit_breaker, etc.) to reduce config to the minimal required structure.
* Document default no-scaling policy and vanilla scaler example in worker adapters
* import documentation changes from #574
* Refactor ORB worker adapter to worker_manager_adapter naming and improve initialization
- Rename worker_adapter/orb/ to worker_manager_adapter/orb/ and worker_adapter.py to worker_manager.py
- Extract AWS/ORB setup into a lazy __initialize() method called at runtime
- Add proper Optional type annotations for deferred fields
- Add assert guards before connector usage
- Fix unlimited workers check (max_workers == -1)
- Condense multi-line imports in uv_ymq __init__, .pyi, and test file
* Refactor ORB config handling and simplify worker manager
- Move dict_utils (camelcase/snakecase) out of formatter into its own module
- Move ORB config files to worker_manager_adapter/orb/config/ and delete from drivers/orb/config/
- Inject AWS region into ORB config at runtime rather than requiring pre-configured files
- Remove allowed_ip config field; drop ingress security group rules (workers connect outbound only)
- Extract _poll_for_instance_id helper and run it in executor to avoid blocking the event loop
- Fix orb_config_path default to use package-relative path
- Update docs and entry point references to scaler_worker_manager_orb
* Rename run_worker_adapter_orb to run_worker_manager_orb
* Fix ORB template missing instance_types and broken region injection
Populate instance_types in the generated template so ORB can resolve
the EC2 instance type when requesting machines. Also fix the region
injection in ORBHelper, which was iterating the wrong key ("providers"
instead of "provider.providers") and silently leaving the region as
us-east-1 regardless of config.
* Use subnet_ids list field instead of subnet_id in ORB template
* Add name field to ORBMachine to fix TypeError on deserialization
* Filter unknown keys when deserializing ORBMachine from dict
* Fix duplicate commands sent to adapter while previous command is in-flight
When an adapter takes a long time to fulfill a command (e.g. ORB polling
for instance IDs), repeated heartbeats caused the scheduler to send new
commands before the previous response arrived. This resulted in duplicate
StartWorkerGroup commands, WorkerGroupTooMuch errors, and spurious "no
pending command found" warnings.
* upgrade to orb 1.2
* Migrate ORB worker manager from CLI subprocess to Python SDK
Replace ORBHelper (subprocess-based CLI wrapper with temp dirs and file I/O)
with direct ORBClient SDK usage, passing config entirely in-memory via
app_config dict. Removes orb_config_path config field, config/ files, and
helper.py entirely.
* Remove unused ORBMachine and ORBRequest types
* Fix WorkerAdapterConfig -> WorkerManagerConfig rename
* Work around ORB SDK app_config timing bug by writing temp config file
* Use ORB_CONFIG_DIR env var to inject config into ORB singleton
* Add template_id, image_id, provider_api to configuration dict for ORB validation
* Switch ORB storage from sql to json (SQLQueryBuilder is abstract in installed version)
* Monkey-patch ORB TemplateRepositoryImpl.get_by_id to accept plain str
* Fix ORB 1.2.2 missing add() method on TemplateRepositoryImpl
Extend monkey-patch to also alias add() -> save() since ORB 1.2.2's
template_handlers.py calls uow.templates.add() but the installed
TemplateRepositoryImpl only exposes save().
* Patch Template.get_domain_events/clear_domain_events missing in ORB 1.2.2
TemplateRepositoryImpl.save() calls template.get_domain_events() and
template.clear_domain_events() but the installed Template Pydantic model
lacks these domain event methods. Add stub implementations via monkey-patch.
* Upgrade ORB dependency to 1.3 and adopt context-manager SDK API
- Bump orb-py requirement from ~=1.2 to ~=1.3
- Replace manual ORBClient init/cleanup with async context manager usage
- Remove monkey-patches and workarounds that were only needed for ORB 1.2
- Use sdk.wait_for_request() instead of manual polling loop
- Simplify config: drop version/storage path fields no longer required
- Clean up unused imports (json, tempfile)
* Update ORB worker manager adapter for post-WorkerGroup protocol
Align the ORB adapter with two upstream refactors:
- Replace WorkerGroup abstraction with direct WorkerID tracking
(StartWorkerGroup/ShutdownWorkerGroup → StartWorkers/ShutdownWorkers,
_worker_groups now maps WorkerID → instance_id str)
- Rename max_workers → max_task_concurrency throughout
* Add opengris-scaler 1.15.0 AMI and move packer files to orb adapter directory
- Built and published ami-044265172bea55d51 (us-east-1) for v1.15.0 / Python 3.13
- Updated public AMI table in orb.rst with new entry
- Moved packer files from src/scaler/drivers/ami/ to src/scaler/worker_manager_adapter/orb/ami/
- Fixed default python_version from 3.14 to 3.13 (pycapnp does not support 3.14)
* Fix ORB create_template call to use flat kwargs instead of nested configuration dict
* Add validate_template call and logging after create_template in ORB setup
* Rename _worker_groups to _workers in ORB worker adapter
* Remove hardcoded --num-of-workers from ORB cluster launch script
* Remove inaccurate worker ID tracking comment from ORB cluster launch script
* Remove hardcoded attribute metadata from ORB create_template call
* Fix ymq import in ORB worker manager after e921fff refactor
* Update orb-py dependency to 1.5.1
* Fix import order in orb worker_manager
* Add orb worker manager support to unified entry points
- Register ORBWorkerAdapterConfig with _tag = "orb" for discriminator-based
TOML parsing in the scaler all-in-one launcher
- Add orb subcommand to scaler_worker_manager dispatcher
- Add ORBWorkerAdapterConfig to WorkerManagerUnion in scaler.py
- Remove redundant top-level event_loop and worker_io_threads fields from
ORBWorkerAdapterConfig in favour of the existing worker_config equivalents
- Update docs (commands.rst, orb.rst) and README to reflect the unified entry point
- Add tests for orb subcommand parsing, TOML config, and _run_worker_manager dispatch
* Remove dedicated scaler_worker_manager_orb entry point
The orb worker manager is now accessible via the unified
scaler_worker_manager orb subcommand, making the dedicated
entry point redundant.
* Work around ORB skipping strategy defaults when config_dict is provided
When ORBClient is initialised with app_config=, its _ensure_raw_config()
merges only default_config.json (which has provider_defaults: {}) with the
caller-supplied dict, skipping the _load_strategy_defaults() call that
normally loads aws_defaults.json. As a result get_effective_handlers()
returns {} and RunInstances is absent from supported_apis, causing:
ApplicationError: Provider does not support API 'RunInstances'. Supported APIs: []
Fix by including provider_defaults.aws.handlers explicitly in
_build_app_config() so the RunInstances handler definition is always
present regardless of how ORB loads its config.
* Remove run_worker_manager_orb script
* Suppress repeated StartWorkers requests after TooManyWorkers
When the ORB adapter is at capacity it returns TooManyWorkers, but the
scheduler's worker count (based on received heartbeats) may still be
below max_task_concurrency because newly-created instances haven't sent
their first heartbeat yet. This caused the scheduler to re-request a
worker on every heartbeat, spamming the log.
Fix: track sources that have returned TooManyWorkers and suppress new
StartWorkers requests for that source until the scheduler's own worker
count drops below max_task_concurrency (indicating a worker left and
the ORB adapter has freed up capacity).
Also fix a latent bug in all three scaling policies where the capacity
check `len(managed) >= max_task_concurrency` is always True when
max_task_concurrency == -1 (unlimited), blocking all scaling.
* Lazy-import orb to fix CI test failures
The module-level `from orb import ORBClient as orb` caused CI tests to
fail when patching ORBWorkerAdapter, because importing the module
triggered the import of `orb` which is not installed in CI. Moving the
import inside `_run()` defers it until the adapter is actually used.
* Update ORB user data to use scaler_worker_manager with --mode fixed
Replace the deprecated scaler_cluster command with scaler_worker_manager
baremetal_native, passing --mode fixed and --worker-manager-id sourced
from ec2-metadata.
* fix io threads
* Add AMI 1.26.4 to docs and fix build.sh version path
Fix incorrect version.txt path in build.sh (was two levels up, should be three), and add the newly built AMI ami-0b76605999d8f5d2b for scaler 1.26.4 / Python 3.13 to the ORB docs table.
* Fix zero-worker default on single-core machines
DEFAULT_MAX_TASK_CONCURRENCY was cpu_count() - 1, which evaluates to 0
on single-core machines. Remove the subtraction so at least one worker
is started by default.
* Fix TooManyWorkers suppression not working during EC2 boot
The _at_capacity_sources clearing condition was inverted: it cleared
suppression when managed_worker_ids < max_task_concurrency, which is
exactly the case during EC2 boot (0 workers, instance not yet registered).
This caused the scheduler to resume spamming StartWorkers on the very
next heartbeat after receiving TooManyWorkers.
Replace the Set-based approach with a baseline Dict that records the
managed worker count at the time TooManyWorkers was received. Suppression
is now held until the scheduler's view of workers grows beyond that
baseline, i.e. at least one booting instance has sent its first heartbeat.
* Rename ORB worker manager to orb_aws_ec2
Renames all identifiers, file names, directories, config tags, CLI
subcommands, docs, README, and tests from `orb` / `ORBWorkerAdapter` to
`orb_aws_ec2` / `ORBAWSEC2WorkerAdapter` to make clear this adapter is
specifically for AWS EC2 via the ORB SDK.
---------
Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com>
Co-authored-by: sharpener6 <1sc2l4qi@duck.com>1 parent 4afab69 commit e60bd2f
25 files changed
Lines changed: 1071 additions & 8 deletions
File tree
- .github/actions/run-test
- docs/source/tutorials
- compatibility
- worker_managers
- src/scaler
- config
- section
- entry_points
- scheduler/controllers
- policies/simple_policy/scaling
- utility
- worker_manager_adapter/orb_aws_ec2
- ami
- worker
- tests/entry_points
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
58 | 59 | | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
59 | 65 | | |
60 | 66 | | |
61 | 67 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
| 282 | + | |
282 | 283 | | |
283 | 284 | | |
284 | 285 | | |
| |||
507 | 508 | | |
508 | 509 | | |
509 | 510 | | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
510 | 545 | | |
511 | 546 | | |
512 | 547 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
352 | 354 | | |
353 | 355 | | |
354 | 356 | | |
| 357 | + | |
355 | 358 | | |
356 | 359 | | |
357 | 360 | | |
| |||
753 | 756 | | |
754 | 757 | | |
755 | 758 | | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
756 | 832 | | |
757 | 833 | | |
758 | 834 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
| 79 | + | |
75 | 80 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
| 61 | + | |
57 | 62 | | |
58 | 63 | | |
59 | 64 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
0 commit comments