Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DFK and JobStatusPoller monitoring zmq channels with Queue radio #3338

Merged
merged 2 commits into from
Apr 11, 2024

Conversation

benclifford
Copy link
Collaborator

Before this PR, the DataFlowKernel and each JobStatusPoller PolledExecutorFacade each opened a ZMQ connection to the monitoring router. These connections are not threadsafe, but (especially in the DFK case) no reasoning or checking stops the DFK's connection being used in multiple threads.

Before this PR, the MonitoringHub presented a 'send' method and stored the DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade contained a copy of the ZMQ channel code to open its own channel, configured using parameters from the DFK passed in at construction.

This PR:

  • moves the above uses to use the MonitoringRadio interface (which was originally designed for remote workers to send monitoring information, but seems ok here too)

  • has MonitoringHub construct an appropriate MonitoringRadio instance for use on the submit side, exposed as self.radio;

  • replaces the implementation of send with a new MultiprocessingQueueRadio which is thread safe but only works in the same multiprocessing environment as the launched monitoring database manager process (which is true on the submit side, but for example means this radio cannot be used on most remote workers)

This work aligns with the prototype in #3315 (which pushes on monitoring radio configuration for remote workers) and pushes in the direction (without getting there) of allowing other submit-side hooks.

This work removes some monitoring specific code from the JobStatusPoller, replacing it with a dependency injection style.
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293

This PR will change how monitoring messages are delivered from the submitting process, and the most obvious thing I can think of that will change is how this will behave under load: heavily loaded messaging causing full buffers and other heavy-load symptoms will now behave as multiprocessing Queues do, rather than as ZMQ connections do. I have not attempted to characterise either of those modes.

Description

Please include a summary of the change and (optionally) which issue is fixed. Please also include
relevant motivation and context.

Changed Behaviour

Which existing user workflows or functionality will behave differently after this PR?

Fixes

Fixes # (issue)

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Bug fix
  • New feature
  • Update to human readable text: Documentation/error messages/comments
  • Code maintenance/cleanup

…ugin

Before this PR, the DataFlowKernel and each JobStatusPoller
PolledExecutorFacade each opened a ZMQ connection to the monitoring router.
These connections are not threadsafe, but (especially in the DFK case) no
reasoning or checking stops the DFK's connection being used in multiple
threads.

Before this PR, the MonitoringHub presented a 'send' method and stored the
DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade
contained a copy of the ZMQ channel code to open its own channel, configured
using parameters from the DFK passed in at construction.

This PR:

* moves the above uses to use the MonitoringRadio interface (which was
originally designed for remote workers to send monitoring information, but
seems ok here too)

* has MonitoringHub construct an appropriate MonitoringRadio instance for
use on the submit side, exposed as self.radio;

* replaces the implementation of send with a new MultiprocessingQueueRadio
which is thread safe but only works in the same multiprocessing environment
as the launched monitoring database manager process (which is true on the
submit side, but for example means this radio cannot be used on most
remote workers)

This work aligns with the prototype in #3315 (which pushes on monitoring
radio configuration for remote workers) and pushes in the direction
(without getting there) of allowing other submit-side hooks.

This work removes some monitoring specific code from the JobStatusPoller,
replacing it with a dependency injection style.
This is part of work to move JobStatusPoller facade state into other classes,
as part of job handling rearrangements in PR #3293

This PR will change how monitoring messages are delivered from the submitting
process, and the most obvious thing I can think of that will change is how
this will behave under load: heavily loaded messaging causing full buffers and
other heavy-load symptoms will now behave as multiprocessing Queues do, rather
than as ZMQ connections do. I have not attempted to characterise either of
those modes.
Comment on lines +186 to +187
def __init__(self, queue: Queue) -> None:
self.queue = queue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to leave the namespace on the type declaration. Would have saved this reviewer at least one (more) look at the top of the file (and later on when rereading this in N months) to confirm that this is, in fact, an MP queue. No biggie, but a suggestion.

parsl/monitoring/radios.py Show resolved Hide resolved
@benclifford benclifford merged commit 3684766 into master Apr 11, 2024
6 checks passed
@benclifford benclifford deleted the benc-radio-in-dfk branch April 11, 2024 15:52
rjmello added a commit to globus/globus-compute that referenced this pull request Jun 27, 2024
As of Parsl PR Parsl/parsl#3338, the `dfk`
argument for `JobStatusPoller` initialization is no longer supported.
rjmello added a commit to globus/globus-compute that referenced this pull request Jun 28, 2024
As of Parsl PR Parsl/parsl#3338, the `dfk`
argument for `JobStatusPoller` initialization is no longer supported.
rjmello added a commit to globus/globus-compute that referenced this pull request Jun 28, 2024
As of Parsl PR Parsl/parsl#3338, the `dfk`
argument for `JobStatusPoller` initialization is no longer supported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants