Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After_connectivity process get stuck in parallel #887

Open
drodarie opened this issue Sep 23, 2024 · 1 comment
Open

After_connectivity process get stuck in parallel #887

drodarie opened this issue Sep 23, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@drodarie
Copy link
Contributor

drodarie commented Sep 23, 2024

from bsb import Scaffold, Configuration, AfterConnectivityHook, config, options

options.verbosity = 4
options.debug_pool = True

@config.node
class TestAfterConn(AfterConnectivityHook):
    def postprocess(self):
        with open("test.txt", "a") as f:
            f.write("in report\n")


cfg = Configuration.default(
    storage={"engine": "hdf5", "root": "network.hdf5"},
    after_connectivity = {"test_after_conn": TestAfterConn()}
)
network = Scaffold(cfg)
network.compile(redo=True)

Produce the following stacktrace and then get stuck:

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/mpipool/_futures.py", line 90, in run
    MPI.COMM_WORLD.send((self._task, (self._args, self._kwargs)), dest=self._worker)
  File "mpi4py/MPI/Comm.pyx", line 1406, in mpi4py.MPI.Comm.send
  File "mpi4py/MPI/msgpickle.pxi", line 210, in mpi4py.MPI.PyMPI_send
  File "mpi4py/MPI/msgpickle.pxi", line 144, in mpi4py.MPI.pickle_dump
  File "mpi4py/MPI/msgpickle.pxi", line 132, in mpi4py.MPI.cdumps
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/mpipool/_futures.py", line 30, in _dill_dumps
    ser = dill.dumps(obj, *args, **kwargs)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 280, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 252, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 420, in dump
    StockPickler.dump(self, obj)
  File "/usr/lib/python3.10/pickle.py", line 487, in dump
    self.save(obj)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1985, in save_function
    _save_with_postproc(pickler, (_create_function, (
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1117, in _save_with_postproc
    pickler.save_reduce(*reduction)
  File "/usr/lib/python3.10/pickle.py", line 692, in save_reduce
    save(args)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/home/toromis/workspace/venv/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 578, in save
    rv = reduce(self.proto)
TypeError: cannot pickle 'mpi4py.MPI.Intracomm' object

@drodarie drodarie added the bug Something isn't working label Sep 23, 2024
@Helveg
Copy link
Contributor

Helveg commented Sep 28, 2024

Hmm, there might be a discrepancy between how Placement/Conn Jobs are serialized, and then how AfterConn and AfterPlacement because they were done sort of hurried afterwards. Since we have a lot of non-picklable objects, the serialization works as follows:

  • Every scaffold and pool registers an ID
  • Every Job class has a static execute handler
  • Submitting the job to the underlying MPIPool happens via the dispatcher function.
    • This function takes each scheduled job classname, pool id, and arguments
    • The dispatcher (function), pool id (int), job classname (string), and job arguments (must be serializable) are all serialized and sent to the worker.
    • The worker runs the dispatcher, which fetches the scaffold and pool from the worker, which should have executed the same code as the main process so far, and should have an identical scaffold/pool registry.
    • Then, the class is fetched by its name from the current module, and its static execute method retrieved
    • The execute method contains the logic needed to reconstruct the job from the arguments, eg, a PlacementJob will take the scaffold.placement[node_name_from_dispatcher_args].place method and run it.

I suspect that either there is a After* specific difference here, or, that it is due to:

cfg = Configuration.default(
    storage={"engine": "hdf5", "root": "network.hdf5"},
    after_connectivity = {"test_after_conn": TestAfterConn()}
)

the object in the conf, but I feel like we would have noticed this in other places then too.

PS: Since options.debug_pool is on, shouldn't there be a lot more loggin before the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants