Merge native adapters by magniloquency · Pull Request #21 · magniloquency/scaler

magniloquency · 2026-03-05T04:01:59Z

No description provided.

Adds a `NativeWorkerAdapterMode` enum (DYNAMIC / FIXED) to `NativeWorkerAdapter`, replacing the separate `FixedNativeWorkerAdapter`. Key changes: - `NativeWorkerAdapterConfig`: new `mode` field, `-n`/`--num-of-workers` alias, and FIXED-mode validation (`max_workers >= 0`) - `NativeWorkerAdapter`: unified `run()` entry point with lazy ZMQ setup (`_setup_zmq()`); FIXED mode pre-spawns workers via `_spawn_initial_workers()`, rejects dynamic scale commands, and self-exits via `_monitor_workers()` when all workers have died - `cluster.py` entry point: now constructs `NativeWorkerAdapter` directly in FIXED mode instead of importing the deleted fixed-native entry point - `combo.py`: replaces `FixedNativeWorkerAdapter` with `NativeWorkerAdapter(mode=FIXED)` launched as a subprocess - Tests: updated to use the subprocess pattern (`Process(target=adapter.run)`) - Docs: `native.rst` extended to cover `--mode`; `fixed_native.rst` deleted; index, configuration, and README updated accordingly - Deleted: `fixed_native.py`, `fixed_native_worker_adapter.py`, `worker_manager_baremetal_fixed_native.py`, `run_worker_manager_baremetal_fixed_native.py`, `fixed_native.rst`

- Replace stale Cluster/ClusterConfig imports with NativeWorkerAdapter/NativeWorkerAdapterConfig - Access adapter config via cluster._worker_adapter instead of removed cluster._cluster - Launch GPU adapter as a subprocess (Process(target=adapter.run)) consistent with current API - Add explicit cleanup of gpu_adapter_process before cluster.shutdown() - Extend isort/black/pflake8/mypy checks to include examples/ directory - Document examples/task_capabilities.py in the Files Modified table

Adds amendment-1.md documenting three changes to the merge plan: simplified FIXED mode run path, config/TOML section rename from NativeWorkerAdapterConfig to NativeWorkerManagerConfig, and new --worker-type parameter. Implements all three changes across source, tests, examples, and docs.

Fixed mode has no event loop or scheduler connector; workers connect directly to the scheduler themselves.

Resolved conflicts by applying PR finos#583 intent (merge FixedNativeWorkerAdapter into NativeWorkerAdapter) on top of main's renames: - NativeWorkerAdapter → NativeWorkerManager - worker_adapter_config → worker_manager_config (WorkerManagerConfig) - Import path: native_worker_adapter.py → native_worker_manager.py - combo.py uses NativeWorkerManager with NativeWorkerManagerMode.FIXED and multiprocessing.Process pattern; stored as _worker_manager - Deleted fixed_native files (fixed_native.py, fixed_native_worker_manager.py, worker_manager_baremetal_fixed_native.py, fixed_native.rst) as intended by finos#583 - Updated cluster.py, all affected tests and examples accordingly - Fixed mypy/pflake8: missing multiprocessing import in test_client.py, stale NativeWorkerAdapter reference in cluster.py

- Signal handling and register_event_loop in _run_fixed() to prevent orphaned worker subprocesses on shutdown - Initialize _ident to None in __init__ so it exists before _setup_zmq() - Type hint on configure_parser parser parameter - Start scheduler before worker manager process in combo.py - Clarifying comment on WorkerGroupInfo.task_arn in ecs.py

…ath_timeout

Merged from remove-cluster-class. When the main task completes, background tasks created via loop.create_task() (the create_async_loop_routine coroutines) may still be running as orphans. Previously, loop.close() was called immediately, causing those tasks to receive RuntimeError: Event loop is closed. Now all pending tasks are cancelled and awaited before the cleanup callback and loop.close(), following the standard asyncio shutdown pattern.

* Remove Cluster class, replace with FixedNativeWorkerAdapter Cluster was a multiprocessing.Process subclass that wrapped FixedNativeWorkerAdapter in a subprocess with an asyncio event loop solely to handle signals. This intermediate process layer is removed; workers are now direct children of SchedulerClusterCombo. - Delete Cluster, ClusterConfig, and the cluster entry point module - Redirect scaler_cluster and run_cluster.py to the fixed native worker adapter entry point (with --num-of-workers alias for compat) - Add SIGINT/SIGTERM handling to the fixed native adapter entry point - Update SchedulerClusterCombo, tests, and examples to use FixedNativeWorkerAdapter directly - Update ECS adapter to use scaler_cluster with updated parameters (--max-workers replaces --num-of-workers/--worker-names; worker IDs are no longer pre-announced since workers self-assign UUIDs) * bump version * Add -n as backward compat alias for scaler_cluster * scaler_cluster uses [cluster] TOML section for backward compat Refactor fixed native entry point to accept a configurable section name, so scaler_cluster reads [cluster] while scaler_worker_adapter_fixed_native continues to read [fixed_native_worker_adapter]. * Update docs to remove stale Cluster class references - Remove ClusterProcess/Worker startup console output from quickstart (no longer produced by FixedNativeWorkerAdapter) - Update worker_adapters/index: Fixed Native is used by SchedulerClusterCombo, not Cluster - Replace --num-of-workers/num_of_workers with canonical --max-workers/max_workers in configuration examples * Fix stale imports in run_* scripts Entry points were renamed from worker_adapter_* to worker_manager_*, and run_cluster.py now imports from the cluster entry point directly. * Fix regression in task_capabilities.py introduced by 9668545 9668545 accidentally reverted the task_capabilities.py changes from b4ebe01, re-introducing Cluster/ClusterConfig which this branch removes. Restore the FixedNativeWorkerAdapter-based version from b4ebe01. * fix: resolve mypy error and apply black/isort formatting Update combo.py to use renamed FixedNativeWorkerManager/FixedNativeWorkerManagerConfig/WorkerManagerConfig (replacing stale FixedNativeWorkerAdapter/FixedNativeWorkerAdapterConfig/WorkerAdapterConfig names). Apply black formatting to three files. * fix: update tests to use renamed FixedNativeWorkerManager classes Update 5 test files to reflect the renaming of FixedNativeWorkerAdapter to FixedNativeWorkerManager and FixedNativeWorkerAdapterConfig to FixedNativeWorkerManagerConfig, including updated import paths and keyword argument names. * fix: update example to use renamed FixedNativeWorkerManager classes * fix: update tests to use renamed WorkerManagerConfig Replace references to the removed worker_adapter module and WorkerAdapterConfig with the renamed worker_manager module and WorkerManagerConfig. * fix: rename adapter variables and identifiers to manager * fix: update test_death_timeout import path from worker_manager_manager to worker_manager_adapter * fix: update task_capabilities example to use WorkerManagerConfig Replace non-existent WorkerAdapterConfig from scaler.config.common.worker_adapter with WorkerManagerConfig from scaler.config.common.worker_manager. * fix: cancel pending tasks before closing event loop in run_task_forever When the main task completes, background tasks created via loop.create_task() (the create_async_loop_routine coroutines) may still be running as orphans. Previously, loop.close() was called immediately, causing those tasks to receive RuntimeError: Event loop is closed when their pending awaits tried to interact with the loop. Now all pending tasks are cancelled and awaited before the cleanup callback and loop.close(), following the standard asyncio shutdown pattern. --------- Co-authored-by: sharpener6 <1sc2l4qi@duck.com>

Resolve trivial conflicts from merging origin/main. All conflicts were between the HEAD NativeWorkerManager/NativeWorkerManagerConfig approach and the intermediate FixedNativeWorkerManager added during the remove-cluster-class merge. Kept HEAD versions throughout.

Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com>

magniloquency and others added 16 commits March 4, 2026 22:16

Rename design doc to match PR number (583)

58cd951

Rename NativeWorkerAdapterMode to NativeWorkerManagerMode

1ee5862

Merge branch 'main' into merge-native-adapters

ee303fa

Merge branch 'remove-cluster-class' into merge-native-adapters

55bbc42

docs: correct fixed mode description in native adapter tutorial

8d7a3cf

Fixed mode has no event loop or scheduler connector; workers connect directly to the scheduler themselves.

fix: merge remove-cluster-class, resolving import conflict in test_de…

68c37ca

…ath_timeout

Remove design/ directory

af6b33c

Update version to 1.20.0

d8d6d80

Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge native adapters#21

Merge native adapters#21
magniloquency wants to merge 16 commits intoremove-cluster-classfrom
merge-native-adapters

magniloquency commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

magniloquency commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant