You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PBS at IPMU requires a user to specify how many cores they want on a node, and allocates nodes fractionally based on this.
This LSST PR tries to do that by setting tasks_per_node on the PBS provider, which is not the right thing to do: setting that does indeed request more cores, but it also causes the launcher lay to run that many copies of the worker pool.
and the htex code mostly assumes that the code will be allocated an entire node. (c.f. the slurm provider code which has a default-on exclusive flag to get an entire node).
The current provider/launcher abstraction isn't able to deal with this value being different for these two different use cases: we want to request many cores per node, but then only have the launcher layer launch one process worker pool per node.
Describe the bug
see LSST PR lsst/ctrl_bps_parsl#36
PBS at IPMU requires a user to specify how many cores they want on a node, and allocates nodes fractionally based on this.
This LSST PR tries to do that by setting tasks_per_node on the PBS provider, which is not the right thing to do: setting that does indeed request more cores, but it also causes the launcher lay to run that many copies of the worker pool.
This value is hard coded to 1 in htex:
parsl/parsl/executors/status_handling.py
Line 251 in dd9150d
and the htex code mostly assumes that the code will be allocated an entire node. (c.f. the slurm provider code which has a default-on exclusive flag to get an entire node).
The current provider/launcher abstraction isn't able to deal with this value being different for these two different use cases: we want to request many cores per node, but then only have the launcher layer launch one process worker pool per node.
cc @ryanchard who knows the most about parsl+pbs
The text was updated successfully, but these errors were encountered: