Open
Conversation
Adds a `NativeWorkerAdapterMode` enum (DYNAMIC / FIXED) to `NativeWorkerAdapter`, replacing the separate `FixedNativeWorkerAdapter`. Key changes: - `NativeWorkerAdapterConfig`: new `mode` field, `-n`/`--num-of-workers` alias, and FIXED-mode validation (`max_workers >= 0`) - `NativeWorkerAdapter`: unified `run()` entry point with lazy ZMQ setup (`_setup_zmq()`); FIXED mode pre-spawns workers via `_spawn_initial_workers()`, rejects dynamic scale commands, and self-exits via `_monitor_workers()` when all workers have died - `cluster.py` entry point: now constructs `NativeWorkerAdapter` directly in FIXED mode instead of importing the deleted fixed-native entry point - `combo.py`: replaces `FixedNativeWorkerAdapter` with `NativeWorkerAdapter(mode=FIXED)` launched as a subprocess - Tests: updated to use the subprocess pattern (`Process(target=adapter.run)`) - Docs: `native.rst` extended to cover `--mode`; `fixed_native.rst` deleted; index, configuration, and README updated accordingly - Deleted: `fixed_native.py`, `fixed_native_worker_adapter.py`, `worker_manager_baremetal_fixed_native.py`, `run_worker_manager_baremetal_fixed_native.py`, `fixed_native.rst`
- Replace stale Cluster/ClusterConfig imports with NativeWorkerAdapter/NativeWorkerAdapterConfig - Access adapter config via cluster._worker_adapter instead of removed cluster._cluster - Launch GPU adapter as a subprocess (Process(target=adapter.run)) consistent with current API - Add explicit cleanup of gpu_adapter_process before cluster.shutdown() - Extend isort/black/pflake8/mypy checks to include examples/ directory - Document examples/task_capabilities.py in the Files Modified table
Adds amendment-1.md documenting three changes to the merge plan: simplified FIXED mode run path, config/TOML section rename from NativeWorkerAdapterConfig to NativeWorkerManagerConfig, and new --worker-type parameter. Implements all three changes across source, tests, examples, and docs.
Fixed mode has no event loop or scheduler connector; workers connect directly to the scheduler themselves.
Resolved conflicts by applying PR finos#583 intent (merge FixedNativeWorkerAdapter into NativeWorkerAdapter) on top of main's renames: - NativeWorkerAdapter → NativeWorkerManager - worker_adapter_config → worker_manager_config (WorkerManagerConfig) - Import path: native_worker_adapter.py → native_worker_manager.py - combo.py uses NativeWorkerManager with NativeWorkerManagerMode.FIXED and multiprocessing.Process pattern; stored as _worker_manager - Deleted fixed_native files (fixed_native.py, fixed_native_worker_manager.py, worker_manager_baremetal_fixed_native.py, fixed_native.rst) as intended by finos#583 - Updated cluster.py, all affected tests and examples accordingly - Fixed mypy/pflake8: missing multiprocessing import in test_client.py, stale NativeWorkerAdapter reference in cluster.py
- Signal handling and register_event_loop in _run_fixed() to prevent orphaned worker subprocesses on shutdown - Initialize _ident to None in __init__ so it exists before _setup_zmq() - Type hint on configure_parser parser parameter - Start scheduler before worker manager process in combo.py - Clarifying comment on WorkerGroupInfo.task_arn in ecs.py
Merged from remove-cluster-class. When the main task completes, background tasks created via loop.create_task() (the create_async_loop_routine coroutines) may still be running as orphans. Previously, loop.close() was called immediately, causing those tasks to receive RuntimeError: Event loop is closed. Now all pending tasks are cancelled and awaited before the cleanup callback and loop.close(), following the standard asyncio shutdown pattern.
* Remove Cluster class, replace with FixedNativeWorkerAdapter Cluster was a multiprocessing.Process subclass that wrapped FixedNativeWorkerAdapter in a subprocess with an asyncio event loop solely to handle signals. This intermediate process layer is removed; workers are now direct children of SchedulerClusterCombo. - Delete Cluster, ClusterConfig, and the cluster entry point module - Redirect scaler_cluster and run_cluster.py to the fixed native worker adapter entry point (with --num-of-workers alias for compat) - Add SIGINT/SIGTERM handling to the fixed native adapter entry point - Update SchedulerClusterCombo, tests, and examples to use FixedNativeWorkerAdapter directly - Update ECS adapter to use scaler_cluster with updated parameters (--max-workers replaces --num-of-workers/--worker-names; worker IDs are no longer pre-announced since workers self-assign UUIDs) * bump version * Add -n as backward compat alias for scaler_cluster * scaler_cluster uses [cluster] TOML section for backward compat Refactor fixed native entry point to accept a configurable section name, so scaler_cluster reads [cluster] while scaler_worker_adapter_fixed_native continues to read [fixed_native_worker_adapter]. * Update docs to remove stale Cluster class references - Remove ClusterProcess/Worker startup console output from quickstart (no longer produced by FixedNativeWorkerAdapter) - Update worker_adapters/index: Fixed Native is used by SchedulerClusterCombo, not Cluster - Replace --num-of-workers/num_of_workers with canonical --max-workers/max_workers in configuration examples * Fix stale imports in run_* scripts Entry points were renamed from worker_adapter_* to worker_manager_*, and run_cluster.py now imports from the cluster entry point directly. * Fix regression in task_capabilities.py introduced by 9668545 9668545 accidentally reverted the task_capabilities.py changes from b4ebe01, re-introducing Cluster/ClusterConfig which this branch removes. Restore the FixedNativeWorkerAdapter-based version from b4ebe01. * fix: resolve mypy error and apply black/isort formatting Update combo.py to use renamed FixedNativeWorkerManager/FixedNativeWorkerManagerConfig/WorkerManagerConfig (replacing stale FixedNativeWorkerAdapter/FixedNativeWorkerAdapterConfig/WorkerAdapterConfig names). Apply black formatting to three files. * fix: update tests to use renamed FixedNativeWorkerManager classes Update 5 test files to reflect the renaming of FixedNativeWorkerAdapter to FixedNativeWorkerManager and FixedNativeWorkerAdapterConfig to FixedNativeWorkerManagerConfig, including updated import paths and keyword argument names. * fix: update example to use renamed FixedNativeWorkerManager classes * fix: update tests to use renamed WorkerManagerConfig Replace references to the removed worker_adapter module and WorkerAdapterConfig with the renamed worker_manager module and WorkerManagerConfig. * fix: rename adapter variables and identifiers to manager * fix: update test_death_timeout import path from worker_manager_manager to worker_manager_adapter * fix: update task_capabilities example to use WorkerManagerConfig Replace non-existent WorkerAdapterConfig from scaler.config.common.worker_adapter with WorkerManagerConfig from scaler.config.common.worker_manager. * fix: cancel pending tasks before closing event loop in run_task_forever When the main task completes, background tasks created via loop.create_task() (the create_async_loop_routine coroutines) may still be running as orphans. Previously, loop.close() was called immediately, causing those tasks to receive RuntimeError: Event loop is closed when their pending awaits tried to interact with the loop. Now all pending tasks are cancelled and awaited before the cleanup callback and loop.close(), following the standard asyncio shutdown pattern. --------- Co-authored-by: sharpener6 <1sc2l4qi@duck.com>
Resolve trivial conflicts from merging origin/main. All conflicts were between the HEAD NativeWorkerManager/NativeWorkerManagerConfig approach and the intermediate FixedNativeWorkerManager added during the remove-cluster-class merge. Kept HEAD versions throughout.
Signed-off-by: magniloquency <197707854+magniloquency@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.