Skip to content

Commit e60bd2f

Browse files
Implement ORB AWS EC2 Worker Adapter (#525)
* Implement ORB Worker Adapter * Add submit_tasks example to documentation and CI skip list - Include submit_tasks.py in examples readme and documentation. - Implement skip_examples.txt for top-level examples in CI. - Add submit_tasks.py to skip_examples.txt as it requires a running scheduler. * move orb/ and ami/ to driver/ * remove superfluous checks in orb config * move import * add a check for self._orb before returning machines * adjust security group rules * move method * rename no random worker ids to deterministic worker ids * add comment * run submit tasks in ci * fix help text * make _filter_data a static method * refactor orb worker adapter polling to use constants * flake8 * don't touch skip examples file * bump minor version * docs: add worker adapter tutorials and update ORB integration details - Add comprehensive documentation for Native, Fixed Native, and ORB worker adapters. - Update README.md with ORB integration and corrected command-line arguments. - Enable autosectionlabel in Sphinx configuration. - Update scaling and compatibility documentation to reflect recent changes. * refactor: move orb and ami drivers to src/scaler/drivers * Delete test.toml Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com> * Refactor ORB worker adapter to use ZMQ-based protocol - Remove aiohttp dependency and RESTful API implementation. - Rename ORBAdapter to ORBWorkerAdapter. - Implement ZMQ DEALER connection to scheduler for commands and heartbeats. - Update start/shutdown logic to return status codes consistent with the new protocol. - Clean up configuration by removing now-unused WebConfig. * fix type error * fix type error * Output ORB templates in snake_case Replace camelcase_dict with direct use of asdict output, which already produces snake_case keys matching the ORBTemplate dataclass field names. * Simplify ORB driver config files Remove unused sections (server, metrics, performance, events, naming, circuit_breaker, etc.) to reduce config to the minimal required structure. * Document default no-scaling policy and vanilla scaler example in worker adapters * import documentation changes from #574 * Refactor ORB worker adapter to worker_manager_adapter naming and improve initialization - Rename worker_adapter/orb/ to worker_manager_adapter/orb/ and worker_adapter.py to worker_manager.py - Extract AWS/ORB setup into a lazy __initialize() method called at runtime - Add proper Optional type annotations for deferred fields - Add assert guards before connector usage - Fix unlimited workers check (max_workers == -1) - Condense multi-line imports in uv_ymq __init__, .pyi, and test file * Refactor ORB config handling and simplify worker manager - Move dict_utils (camelcase/snakecase) out of formatter into its own module - Move ORB config files to worker_manager_adapter/orb/config/ and delete from drivers/orb/config/ - Inject AWS region into ORB config at runtime rather than requiring pre-configured files - Remove allowed_ip config field; drop ingress security group rules (workers connect outbound only) - Extract _poll_for_instance_id helper and run it in executor to avoid blocking the event loop - Fix orb_config_path default to use package-relative path - Update docs and entry point references to scaler_worker_manager_orb * Rename run_worker_adapter_orb to run_worker_manager_orb * Fix ORB template missing instance_types and broken region injection Populate instance_types in the generated template so ORB can resolve the EC2 instance type when requesting machines. Also fix the region injection in ORBHelper, which was iterating the wrong key ("providers" instead of "provider.providers") and silently leaving the region as us-east-1 regardless of config. * Use subnet_ids list field instead of subnet_id in ORB template * Add name field to ORBMachine to fix TypeError on deserialization * Filter unknown keys when deserializing ORBMachine from dict * Fix duplicate commands sent to adapter while previous command is in-flight When an adapter takes a long time to fulfill a command (e.g. ORB polling for instance IDs), repeated heartbeats caused the scheduler to send new commands before the previous response arrived. This resulted in duplicate StartWorkerGroup commands, WorkerGroupTooMuch errors, and spurious "no pending command found" warnings. * upgrade to orb 1.2 * Migrate ORB worker manager from CLI subprocess to Python SDK Replace ORBHelper (subprocess-based CLI wrapper with temp dirs and file I/O) with direct ORBClient SDK usage, passing config entirely in-memory via app_config dict. Removes orb_config_path config field, config/ files, and helper.py entirely. * Remove unused ORBMachine and ORBRequest types * Fix WorkerAdapterConfig -> WorkerManagerConfig rename * Work around ORB SDK app_config timing bug by writing temp config file * Use ORB_CONFIG_DIR env var to inject config into ORB singleton * Add template_id, image_id, provider_api to configuration dict for ORB validation * Switch ORB storage from sql to json (SQLQueryBuilder is abstract in installed version) * Monkey-patch ORB TemplateRepositoryImpl.get_by_id to accept plain str * Fix ORB 1.2.2 missing add() method on TemplateRepositoryImpl Extend monkey-patch to also alias add() -> save() since ORB 1.2.2's template_handlers.py calls uow.templates.add() but the installed TemplateRepositoryImpl only exposes save(). * Patch Template.get_domain_events/clear_domain_events missing in ORB 1.2.2 TemplateRepositoryImpl.save() calls template.get_domain_events() and template.clear_domain_events() but the installed Template Pydantic model lacks these domain event methods. Add stub implementations via monkey-patch. * Upgrade ORB dependency to 1.3 and adopt context-manager SDK API - Bump orb-py requirement from ~=1.2 to ~=1.3 - Replace manual ORBClient init/cleanup with async context manager usage - Remove monkey-patches and workarounds that were only needed for ORB 1.2 - Use sdk.wait_for_request() instead of manual polling loop - Simplify config: drop version/storage path fields no longer required - Clean up unused imports (json, tempfile) * Update ORB worker manager adapter for post-WorkerGroup protocol Align the ORB adapter with two upstream refactors: - Replace WorkerGroup abstraction with direct WorkerID tracking (StartWorkerGroup/ShutdownWorkerGroup → StartWorkers/ShutdownWorkers, _worker_groups now maps WorkerID → instance_id str) - Rename max_workers → max_task_concurrency throughout * Add opengris-scaler 1.15.0 AMI and move packer files to orb adapter directory - Built and published ami-044265172bea55d51 (us-east-1) for v1.15.0 / Python 3.13 - Updated public AMI table in orb.rst with new entry - Moved packer files from src/scaler/drivers/ami/ to src/scaler/worker_manager_adapter/orb/ami/ - Fixed default python_version from 3.14 to 3.13 (pycapnp does not support 3.14) * Fix ORB create_template call to use flat kwargs instead of nested configuration dict * Add validate_template call and logging after create_template in ORB setup * Rename _worker_groups to _workers in ORB worker adapter * Remove hardcoded --num-of-workers from ORB cluster launch script * Remove inaccurate worker ID tracking comment from ORB cluster launch script * Remove hardcoded attribute metadata from ORB create_template call * Fix ymq import in ORB worker manager after e921fff refactor * Update orb-py dependency to 1.5.1 * Fix import order in orb worker_manager * Add orb worker manager support to unified entry points - Register ORBWorkerAdapterConfig with _tag = "orb" for discriminator-based TOML parsing in the scaler all-in-one launcher - Add orb subcommand to scaler_worker_manager dispatcher - Add ORBWorkerAdapterConfig to WorkerManagerUnion in scaler.py - Remove redundant top-level event_loop and worker_io_threads fields from ORBWorkerAdapterConfig in favour of the existing worker_config equivalents - Update docs (commands.rst, orb.rst) and README to reflect the unified entry point - Add tests for orb subcommand parsing, TOML config, and _run_worker_manager dispatch * Remove dedicated scaler_worker_manager_orb entry point The orb worker manager is now accessible via the unified scaler_worker_manager orb subcommand, making the dedicated entry point redundant. * Work around ORB skipping strategy defaults when config_dict is provided When ORBClient is initialised with app_config=, its _ensure_raw_config() merges only default_config.json (which has provider_defaults: {}) with the caller-supplied dict, skipping the _load_strategy_defaults() call that normally loads aws_defaults.json. As a result get_effective_handlers() returns {} and RunInstances is absent from supported_apis, causing: ApplicationError: Provider does not support API 'RunInstances'. Supported APIs: [] Fix by including provider_defaults.aws.handlers explicitly in _build_app_config() so the RunInstances handler definition is always present regardless of how ORB loads its config. * Remove run_worker_manager_orb script * Suppress repeated StartWorkers requests after TooManyWorkers When the ORB adapter is at capacity it returns TooManyWorkers, but the scheduler's worker count (based on received heartbeats) may still be below max_task_concurrency because newly-created instances haven't sent their first heartbeat yet. This caused the scheduler to re-request a worker on every heartbeat, spamming the log. Fix: track sources that have returned TooManyWorkers and suppress new StartWorkers requests for that source until the scheduler's own worker count drops below max_task_concurrency (indicating a worker left and the ORB adapter has freed up capacity). Also fix a latent bug in all three scaling policies where the capacity check `len(managed) >= max_task_concurrency` is always True when max_task_concurrency == -1 (unlimited), blocking all scaling. * Lazy-import orb to fix CI test failures The module-level `from orb import ORBClient as orb` caused CI tests to fail when patching ORBWorkerAdapter, because importing the module triggered the import of `orb` which is not installed in CI. Moving the import inside `_run()` defers it until the adapter is actually used. * Update ORB user data to use scaler_worker_manager with --mode fixed Replace the deprecated scaler_cluster command with scaler_worker_manager baremetal_native, passing --mode fixed and --worker-manager-id sourced from ec2-metadata. * fix io threads * Add AMI 1.26.4 to docs and fix build.sh version path Fix incorrect version.txt path in build.sh (was two levels up, should be three), and add the newly built AMI ami-0b76605999d8f5d2b for scaler 1.26.4 / Python 3.13 to the ORB docs table. * Fix zero-worker default on single-core machines DEFAULT_MAX_TASK_CONCURRENCY was cpu_count() - 1, which evaluates to 0 on single-core machines. Remove the subtraction so at least one worker is started by default. * Fix TooManyWorkers suppression not working during EC2 boot The _at_capacity_sources clearing condition was inverted: it cleared suppression when managed_worker_ids < max_task_concurrency, which is exactly the case during EC2 boot (0 workers, instance not yet registered). This caused the scheduler to resume spamming StartWorkers on the very next heartbeat after receiving TooManyWorkers. Replace the Set-based approach with a baseline Dict that records the managed worker count at the time TooManyWorkers was received. Suppression is now held until the scheduler's view of workers grows beyond that baseline, i.e. at least one booting instance has sent its first heartbeat. * Rename ORB worker manager to orb_aws_ec2 Renames all identifiers, file names, directories, config tags, CLI subcommands, docs, README, and tests from `orb` / `ORBWorkerAdapter` to `orb_aws_ec2` / `ORBAWSEC2WorkerAdapter` to make clear this adapter is specifically for AWS EC2 via the ORB SDK. --------- Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com> Co-authored-by: sharpener6 <1sc2l4qi@duck.com>
1 parent 4afab69 commit e60bd2f

25 files changed

Lines changed: 1071 additions & 8 deletions

File tree

.github/actions/run-test/action.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,13 @@ runs:
5555
run: |
5656
uv pip install --system -r examples/applications/requirements_applications.txt
5757
uv pip install --system -r examples/ray_compat/requirements.txt
58+
readarray -t skip_examples < examples/skip_examples.txt
5859
for example in "./examples"/*.py; do
60+
filename=$(basename "$example")
61+
if [[ " ${skip_examples[*]} " =~ [[:space:]]${filename}[[:space:]] ]]; then
62+
echo "Skipping $example"
63+
continue
64+
fi
5965
echo "Running $example"
6066
python $example
6167
done

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ CMakeFiles/
3636
src/scaler/protocol/capnp/*.c++
3737
src/scaler/protocol/capnp/*.h
3838

39+
orb/logs/
40+
orb/metrics/
41+
3942
# AWS HPC test-generated files
4043
.scaler_aws_batch_config.json
4144
.scaler_aws_hpc.env

README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,7 @@ The following table maps each Scaler command to its corresponding section name i
279279
| `scaler_worker_manager symphony` | `[[worker_manager]]` + `type = "symphony"` |
280280
| `scaler_worker_manager aws_raw_ecs` | `[[worker_manager]]` + `type = "aws_raw_ecs"` |
281281
| `scaler_worker_manager aws_hpc` | `[[worker_manager]]` + `type = "aws_hpc"` |
282+
| `scaler_worker_manager orb_aws_ec2` | `[[worker_manager]]` + `type = "orb_aws_ec2"` |
282283

283284
### Practical Scenarios & Examples
284285

@@ -507,6 +508,40 @@ where `deepest_nesting_level` is the deepest nesting level a task has in your wo
507508
workload that has
508509
a base task that calls a nested task that calls another nested task, then the deepest nesting level is 2.
509510
511+
## ORB AWS EC2 integration
512+
513+
A Scaler scheduler can interface with ORB (Open Resource Broker) to dynamically provision and manage workers on AWS EC2 instances.
514+
515+
```bash
516+
$ scaler_worker_manager orb_aws_ec2 tcp://127.0.0.1:2345 --image-id ami-0528819f94f4f5fa5
517+
```
518+
519+
This will start an ORB AWS EC2 worker adapter that connects to the Scaler scheduler at `tcp://127.0.0.1:2345`. The scheduler can then request new workers from this adapter, which will be launched as EC2 instances.
520+
521+
The ORB AWS EC2 worker manager can also be included in a `scaler` all-in-one TOML config:
522+
523+
```toml
524+
[scheduler]
525+
scheduler_address = "tcp://127.0.0.1:2345"
526+
527+
[[worker_manager]]
528+
type = "orb_aws_ec2"
529+
scheduler_address = "tcp://127.0.0.1:2345"
530+
image_id = "ami-0528819f94f4f5fa5"
531+
instance_type = "t3.medium"
532+
aws_region = "us-east-1"
533+
```
534+
535+
### Configuration
536+
537+
The ORB AWS EC2 adapter requires `orb-py` and `boto3` to be installed. You can install them with:
538+
539+
```bash
540+
$ pip install "opengris-scaler[orb_aws_ec2]"
541+
```
542+
543+
For more details on configuring ORB AWS EC2, including AWS credentials and instance templates, please refer to the [ORB AWS EC2 Worker Adapter documentation](https://finos.github.io/opengris-scaler/tutorials/worker_manager_adapter/orb_aws_ec2.html).
544+
510545
## Worker Manager usage
511546
512547
> **Note**: This feature is experimental and may change in future releases.

docs/source/tutorials/commands.rst

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ After installing ``opengris-scaler``, the following CLI commands are available f
1414
* - :ref:`scaler_scheduler <cmd-scaler-scheduler>`
1515
- Start only the scheduler process (and auto-start object storage when needed).
1616
* - :ref:`scaler_worker_manager <cmd-scaler-worker-manager>`
17-
- Start one worker manager using a subcommand (``baremetal_native``, ``symphony``, ``aws_raw_ecs``, ``aws_hpc``).
17+
- Start one worker manager using a subcommand (``baremetal_native``, ``symphony``, ``aws_raw_ecs``, ``aws_hpc``, ``orb_aws_ec2``).
1818
* - :ref:`scaler_object_storage_server <cmd-scaler-object-storage-server>`
1919
- Start only the object storage server.
2020
* - :ref:`scaler_top <cmd-scaler-top>`
@@ -53,6 +53,8 @@ All commands support ``--config``/``-c``. In practice, most deployments use TOML
5353
- ``[[worker_manager]]`` + ``type = "aws_raw_ecs"``
5454
* - ``scaler_worker_manager aws_hpc``
5555
- ``[[worker_manager]]`` + ``type = "aws_hpc"``
56+
* - ``scaler_worker_manager orb_aws_ec2``
57+
- ``[[worker_manager]]`` + ``type = "orb_aws_ec2"``
5658

5759

5860
.. _cmd-scaler:
@@ -352,6 +354,7 @@ Available subcommands:
352354
- ``symphony``
353355
- ``aws_raw_ecs``
354356
- ``aws_hpc``
357+
- ``orb_aws_ec2``
355358

356359
When ``--config``/``-c`` is supplied, ``scaler_worker_manager`` reads the ``[[worker_manager]]``
357360
array from the TOML file and picks the entry whose ``type`` field matches the subcommand.
@@ -753,6 +756,79 @@ AWS Batch worker manager.
753756
- ``60``
754757
- Timeout for each submitted job.
755758

759+
Subcommand: ``orb_aws_ec2``
760+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
761+
762+
ORB (Open Resource Broker) worker manager — dynamically provisions workers on AWS EC2 instances.
763+
764+
.. code-block:: bash
765+
766+
$ scaler_worker_manager orb_aws_ec2 [options] <scheduler_address>
767+
768+
.. tabs::
769+
770+
.. group-tab:: command line
771+
772+
.. code-block:: bash
773+
774+
$ scaler_worker_manager orb_aws_ec2 tcp://127.0.0.1:6378 \
775+
--object-storage-address tcp://127.0.0.1:6379 \
776+
--image-id ami-0528819f94f4f5fa5 \
777+
--instance-type t3.medium \
778+
--aws-region us-east-1
779+
780+
.. group-tab:: config.toml
781+
782+
.. code-block:: toml
783+
784+
[[worker_manager]]
785+
type = "orb_aws_ec2"
786+
scheduler_address = "tcp://127.0.0.1:6378"
787+
object_storage_address = "tcp://127.0.0.1:6379"
788+
image_id = "ami-0528819f94f4f5fa5"
789+
instance_type = "t3.medium"
790+
aws_region = "us-east-1"
791+
792+
Run command:
793+
794+
.. code-block:: bash
795+
796+
$ scaler config.toml
797+
798+
.. list-table::
799+
:header-rows: 1
800+
801+
* - Argument
802+
- Required
803+
- Default
804+
- Description
805+
* - ``--image-id``
806+
- Yes
807+
- -
808+
- AMI ID for the worker EC2 instances.
809+
* - ``--instance-type``
810+
- No
811+
- ``t2.micro``
812+
- EC2 instance type.
813+
* - ``--aws-region``
814+
- No
815+
- ``us-east-1``
816+
- AWS region.
817+
* - ``--key-name``
818+
- No
819+
- ``None``
820+
- AWS key pair name. A temporary key pair is created if omitted.
821+
* - ``--subnet-id``
822+
- No
823+
- ``None``
824+
- AWS subnet ID. Defaults to the default subnet in the default VPC.
825+
* - ``--security-group-ids``
826+
- No
827+
- ``[]``
828+
- Comma-separated AWS security group IDs. A temporary group is created if omitted.
829+
830+
For full details, see :doc:`worker_managers/orb_aws_ec2`.
831+
756832

757833
.. _cmd-scaler-object-storage-server:
758834

docs/source/tutorials/compatibility/ray.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Ray
66
Scaler is a lightweight distributed computation engine similar to Ray. Scaler supports many of the same concepts as Ray including
77
remote functions (known as tasks in Scaler), futures, cluster object storage, labels (known as capabilities in Scaler), and it comes with comparable monitoring tools.
88

9-
Unlike Ray, Scaler supports both local clusters and also easily integrates with multiple cloud providers out of the box, including AWS EC2 and IBM Symphony,
9+
Unlike Ray, Scaler supports both local clusters and also easily integrates with multiple cloud providers out of the box, including ORB (AWS EC2) and IBM Symphony,
1010
with more providers planned for the future. You can view our `roadmap on GitHub <https://github.com/finos/opengris-scaler/discussions/333>`_
1111
for details on upcoming cloud integrations.
1212

docs/source/tutorials/worker_managers/index.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ Worker Managers Overview
5454
- Offloads tasks to IBM Spectrum Symphony via the SOAM API.
5555
- Concurrency-limited
5656
- IBM Symphony
57+
* - :doc:`ORB AWS EC2 <orb_aws_ec2>`
58+
- Dynamically provisions workers on AWS EC2 instances using the ORB system.
59+
- Dynamic (scheduler-driven)
60+
- AWS EC2
5761

5862
Although worker managers target different infrastructures, many configuration options are shared.
5963
See :doc:`Common Worker Manager Parameters <common_parameters>` for these shared settings.
@@ -72,4 +76,5 @@ The :ref:`scaler <cmd-scaler>` command boots the full stack from a single TOML c
7276
aws_hpc_batch
7377
aws_raw_ecs
7478
symphony
79+
orb_aws_ec2
7580
common_parameters
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
ORB AWS EC2 Worker Adapter
2+
==========================
3+
4+
The ORB AWS EC2 worker adapter allows Scaler to dynamically provision workers on AWS EC2 instances using the ORB (Open Resource Broker) system. This is particularly useful for scaling workloads that require significant compute resources or specialized hardware available in the cloud.
5+
6+
This tutorial describes the steps required to get up and running with the ORB AWS EC2 adapter.
7+
8+
Requirements
9+
------------
10+
11+
Before using the ORB AWS EC2 worker adapter, ensure the following requirements are met on the machine that will run the adapter:
12+
13+
1. **orb-py and boto3**: The ``orb-py`` and ``boto3`` packages must be installed. These can be installed using the ``orb_aws_ec2`` optional dependency of Scaler:
14+
15+
.. code-block:: bash
16+
17+
pip install "opengris-scaler[orb_aws_ec2]"
18+
19+
2. **AWS CLI**: The AWS Command Line Interface must be installed and configured with a default profile that has permissions to launch, describe, and terminate EC2 instances.
20+
21+
3. **Network Connectivity**: The adapter must be able to communicate with AWS APIs and the Scaler scheduler.
22+
23+
Getting Started
24+
---------------
25+
26+
To start the ORB AWS EC2 worker adapter, use the ``scaler_worker_manager orb_aws_ec2`` subcommand:
27+
28+
.. code-block:: bash
29+
30+
scaler_worker_manager orb_aws_ec2 tcp://<SCHEDULER_EXTERNAL_IP>:8516 \
31+
--object-storage-address tcp://<OSS_EXTERNAL_IP>:8517 \
32+
--image-id ami-0528819f94f4f5fa5 \
33+
--instance-type t3.medium \
34+
--aws-region us-east-1 \
35+
--logging-level INFO \
36+
--task-timeout-seconds 60
37+
38+
Equivalent configuration using a TOML file with ``scaler``:
39+
40+
.. code-block:: toml
41+
42+
# stack.toml
43+
44+
[scheduler]
45+
scheduler_address = "tcp://<SCHEDULER_EXTERNAL_IP>:8516"
46+
47+
[[worker_manager]]
48+
type = "orb_aws_ec2"
49+
scheduler_address = "tcp://<SCHEDULER_EXTERNAL_IP>:8516"
50+
object_storage_address = "tcp://<OSS_EXTERNAL_IP>:8517"
51+
image_id = "ami-0528819f94f4f5fa5"
52+
instance_type = "t3.medium"
53+
aws_region = "us-east-1"
54+
logging_level = "INFO"
55+
task_timeout_seconds = 60
56+
57+
.. code-block:: bash
58+
59+
scaler stack.toml
60+
61+
* ``tcp://<SCHEDULER_EXTERNAL_IP>:8516`` is the address workers will use to connect to the scheduler.
62+
* ``tcp://<OSS_EXTERNAL_IP>:8517`` is the address workers will use to connect to the object storage server.
63+
* New workers will be launched using the specified AMI and instance type.
64+
65+
Networking Configuration
66+
------------------------
67+
68+
Workers launched by the ORB AWS EC2 adapter are EC2 instances and require an externally-reachable IP address for the scheduler.
69+
70+
* **Internal Communication**: If the machine running the scheduler is another EC2 instance in the same VPC, you can use EC2 private IP addresses.
71+
* **Public Internet**: If communicating over the public internet, it is highly recommended to set up robust security rules and/or a VPN to protect the cluster.
72+
73+
Publicly Available AMIs
74+
-----------------------
75+
76+
We regularly publish publicly available Amazon Machine Images (AMIs) with Python and ``opengris-scaler`` pre-installed.
77+
78+
.. list-table:: Available Public AMIs
79+
:widths: 15 15 20 20 30
80+
:header-rows: 1
81+
82+
* - Scaler Version
83+
- Python Version
84+
- Amazon Linux 2023 Version
85+
- Date (MM/DD/YYYY)
86+
- AMI ID (us-east-1)
87+
* - 1.14.2
88+
- 3.13
89+
- 2023.10.20260120
90+
- 01/30/2026
91+
- ``ami-0528819f94f4f5fa5``
92+
* - 1.15.0
93+
- 3.13
94+
- 2023.10.20260302.1
95+
- 03/16/2026
96+
- ``ami-044265172bea55d51``
97+
* - 1.26.4
98+
- 3.13
99+
- 2023.10.20260302.1
100+
- 03/26/2026
101+
- ``ami-0b76605999d8f5d2b``
102+
103+
New AMIs will be added to this list as they become available.
104+
105+
Supported Parameters
106+
--------------------
107+
108+
.. note::
109+
For more details on how to configure Scaler, see the :doc:`../configuration` section.
110+
111+
The ORB AWS EC2 worker adapter supports ORB-specific configuration parameters as well as common worker adapter parameters.
112+
113+
ORB AWS EC2 Template Configuration
114+
~~~~~~~~~~~~~~~~~~~~~~~~~~
115+
116+
* ``--image-id`` (Required): AMI ID for the worker instances.
117+
* ``--instance-type``: EC2 instance type (default: ``t2.micro``).
118+
* ``--aws-region``: AWS region (default: ``us-east-1``).
119+
* ``--key-name``: AWS key pair name for the instances. If not provided, a temporary key pair will be created and deleted on cleanup.
120+
* ``--subnet-id``: AWS subnet ID where the instances will be launched. If not provided, it attempts to discover the default subnet in the default VPC.
121+
* ``--security-group-ids``: Comma-separated list of AWS security group IDs.
122+
* ``--allowed-ip``: IP address to allow in the security group (if created automatically). Defaults to the adapter's external IP.
123+
* ``--orb-config-path``: Path to the ORB root directory (default: ``src/scaler/drivers/orb``).
124+
125+
Common Parameters
126+
~~~~~~~~~~~~~~~~~
127+
128+
For a full list of common parameters including networking, worker configuration, and logging, see :doc:`common_parameters`.
129+
130+
Cleanup
131+
-------
132+
133+
The ORB AWS EC2 worker adapter is designed to be self-cleaning, but it is important to be aware of the resources it manages:
134+
135+
* **Key Pairs**: If a ``--key-name`` is not provided, the adapter creates a temporary AWS key pair.
136+
* **Security Groups**: If ``--security-group-ids`` are not provided, the adapter creates a temporary security group to allow communication.
137+
* **Launch Templates**: ORB may additionally create EC2 Launch Templates as part of the machine provisioning process.
138+
139+
The adapter attempts to delete these temporary resources and terminate all launched EC2 instances when it shuts down gracefully. However, in the event of an ungraceful crash or network failure, some resources may persist in your AWS account.
140+
141+
.. tip::
142+
It is recommended to periodically check your AWS console for any orphaned resources (instances, security groups, key pairs, or launch templates) and clean them up manually if necessary to avoid unexpected costs.
143+
144+
.. warning::
145+
**Subnet and Security Groups**: Currently, specifying ``--subnet-id`` or ``--security-group-ids`` via configuration might not have the intended effect as the adapter is designed to auto-discover or create these resources. Specifically, the adapter may still attempt to use default subnets or create its own temporary security groups regardless of these parameters.

pyproject.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,15 @@ graphblas = [
5050
aws = [
5151
"boto3",
5252
]
53+
orb_aws_ec2 = [
54+
"orb-py~=1.5.1; python_version >= '3.10'",
55+
"boto3; python_version >= '3.10'",
56+
]
5357
all = [
5458
"opengris-scaler[aws]",
5559
"opengris-scaler[graphblas]",
5660
"opengris-scaler[gui]",
61+
"opengris-scaler[orb_aws_ec2]",
5762
"opengris-scaler[uvloop]",
5863
]
5964

src/scaler/config/defaults.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
# WORKER SPECIFIC OPTIONS
5757

5858
# number of workers, echo worker use 1 process
59-
DEFAULT_MAX_TASK_CONCURRENCY = os.cpu_count() - 1
59+
DEFAULT_MAX_TASK_CONCURRENCY = os.cpu_count()
6060

6161
# number of seconds that worker agent send heartbeat to scheduler
6262
DEFAULT_HEARTBEAT_INTERVAL_SECONDS = 2

0 commit comments

Comments
 (0)