Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5py issue during simulation #836

Open
drodarie opened this issue May 2, 2024 · 1 comment
Open

H5py issue during simulation #836

drodarie opened this issue May 2, 2024 · 1 comment

Comments

@drodarie
Copy link
Contributor

drodarie commented May 2, 2024

When running a simulation with Nest, using MPI, one of the cores fails to get access to the h5 file.
Command used:

mpirun -n 6 bsb -v=4 simulate cerebellum.hdf5 basal_activity

Stack trace:

Traceback (most recent call last):
  File "/home/toromis/Workspace/venv/bin/bsb", line 8, in <module>
    sys.exit(handle_cli())
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/__init__.py", line 11, in handle_cli
    handle_command(sys.argv[1:], exit=True)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/__init__.py", line 31, in handle_command
    namespace.handler(namespace, dryrun=dryrun)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/commands/__init__.py", line 99, in execute_handler
    self.handler(context)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/commands/_commands.py", line 208, in handler
    result = network.run_simulation(sim_name)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/profiling.py", line 159, in decorated
    return f(*args, **kwargs)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/core.py", line 443, in run_simulation
    return adapter.simulate(simulation)[0]
  File "/home/toromis/Workspace/dbbs/bsb/bsb-nest/bsb_nest/adapter.py", line 53, in simulate
    return super().simulate(simulation)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/adapter.py", line 76, in simulate
    data = self.prepare(simulation)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-nest/bsb_nest/adapter.py", line 58, in prepare
    self.simdata[simulation] = SimulationData(
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/adapter.py", line 52, in __init__
    self.placement: dict["CellModel", "PlacementSet"] = {
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/adapter.py", line 53, in <dictcomp>
    model: model.get_placement_set() for model in simulation.cell_models.values()
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/cell.py", line 35, in get_placement_set
    return self.cell_type.get_placement_set(chunks=chunks)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cell_types.py", line 103, in get_placement_set
    return self.scaffold.get_placement_set(self, *args, **kwargs)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/core.py", line 553, in get_placement_set
    return self.storage.get_placement_set(
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/storage/__init__.py", line 286, in get_placement_set
    ps = self._PlacementSet(self._engine, type)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-hdf5/bsb_hdf5/placement_set.py", line 87, in __init__
    if not self.exists(engine, cell_type):
  File "/home/toromis/Workspace/dbbs/bsb/bsb-hdf5/bsb_hdf5/placement_set.py", line 108, in exists
    with engine._handle("r") as h:
  File "/home/toromis/Workspace/dbbs/bsb/bsb-hdf5/bsb_hdf5/__init__.py", line 141, in _handle
    return h5py.File(self._root, mode)
  File "/home/toromis/Workspace/venv/lib/python3.10/site-packages/h5py/_hl/files.py", line 562, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/home/toromis/Workspace/venv/lib/python3.10/site-packages/h5py/_hl/files.py", line 235, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 102, in h5py.h5f.open
BlockingIOError: [Errno 11] Unable to synchronously open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

At the moment, I do not have a simple way to replicate this issue, since it happens randomly.

@Helveg
Copy link
Contributor

Helveg commented May 2, 2024

Please confirm, but during simulation the storage engine should be operating in "readonly" mode, so it would be safe to use HDF5_USE_FILE_LOCKING=FALSE as a workaround.

This may however indicate a problem with your MPI installation. Since an MPI-Window based lock should be active to prevent these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants