You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
In some cases, I want to use two different schedulers on the same remote, e.g. hyperqueue for small jobs that I want to run on partial nodes, but Slurm for bigger jobs where I need multiple nodes and a solid chunk of walltime. Currently, this means I have to set up two computers with different schedulers. However, if one calculation needs to copy/symlink files from a previous one run on a different scheduler (i.e. computer), this currently fails with a NotImplementedError since the execmanager compares the computer UUIDs:
f'[submission of calculation {node.pk}] copying {dest_rel_path} '
f'remotely, directly on the machine {computer.label}'
)
try:
transport.copy(remote_abs_path, dest_rel_path)
exceptFileNotFoundError:
logger.warning(
f'[submission of calculation {node.pk}] Unable to copy remote '
f'resource from {remote_abs_path} to {dest_rel_path}! NOT Stopping but just ignoring!.'
)
except (IOError, OSError):
logger.warning(
f'[submission of calculation {node.pk}] Unable to copy remote '
f'resource from {remote_abs_path} to {dest_rel_path}! Stopping.'
)
raise
else:
raiseNotImplementedError(
f'[submission of calculation {node.pk}] Remote copy between two different machines is '
'not implemented yet'
)
Describe the solution you'd like
One solution that I've been running with locally is to compare the hostname of the computers instead, which seemed sensible at first glance. There may be certain cases where this breaks, however?
Describe alternatives you've considered
It's clear that a computer can be used with multiple schedulers. Besides the hyperqueue case, you might want to run e.g. an aiida-shell job directly on the login node. Instead of setting up multiple computers, maybe a computer can be configured with multiple schedulers with one the default and the others can be used by setting an option?
Note: This is also important if you e.g. share a work chain with another user and this work chain has files stashed on the remote that need to be copied for a next step.
Is your feature request related to a problem? Please describe
In some cases, I want to use two different schedulers on the same remote, e.g.
hyperqueue
for small jobs that I want to run on partial nodes, butSlurm
for bigger jobs where I need multiple nodes and a solid chunk of walltime. Currently, this means I have to set up two computers with different schedulers. However, if one calculation needs to copy/symlink files from a previous one run on a different scheduler (i.e. computer), this currently fails with aNotImplementedError
since theexecmanager
compares the computer UUIDs:aiida-core/aiida/engine/daemon/execmanager.py
Lines 250 to 272 in 8a2fece
Describe the solution you'd like
One solution that I've been running with locally is to compare the
hostname
of the computers instead, which seemed sensible at first glance. There may be certain cases where this breaks, however?Describe alternatives you've considered
It's clear that a computer can be used with multiple schedulers. Besides the
hyperqueue
case, you might want to run e.g. anaiida-shell
job directly on the login node. Instead of setting up multiple computers, maybe a computer can be configured with multiple schedulers with one the default and the others can be used by setting an option?Additional context
Related to #5084
The text was updated successfully, but these errors were encountered: