Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterate readIterations/read_iterations Multiple Times #1418

Open
ax3l opened this issue Apr 5, 2023 · 7 comments
Open

Iterate readIterations/read_iterations Multiple Times #1418

ax3l opened this issue Apr 5, 2023 · 7 comments

Comments

@ax3l
Copy link
Member

ax3l commented Apr 5, 2023

Describe the bug
Currently, read_iterations() cannot be looped multiple times.
Error message:

openpmd_api.openpmd_api_cxx.ErrorWrongAPIUsage:
  Wrong API usage: Trying to call Series::readIterations() on a (partially) read Series.

This is a bit unusual, since it should start over on the same open series, at least in regular/random access mode (non-streaming.)

To Reproduce
Python:

import openpmd_api as io

# ...
series = io.Series(filename, io.Access_Type.read_only)

# start-to-end read:
for count, iteration in enumerate(series.read_iterations()):
    pass

# another start-to-end read:
for count, iteration in enumerate(series.read_iterations()):
    pass

Expected behavior
Usually in Python on generators/iterators, one should start iterations over when reading files that way.

Software Environment

  • version of openPMD-api: 0.15.0 & 0.15.1
  • installed openPMD-api via: conda
  • operating system: Linux
  • machine: any
  • name and version of Python implementation: any
  • version of HDF5: hdf5 1.12.2 (nompi_ha7af310_101)
  • version of ADIOS1: N/A
  • version of ADIOS2: N/A
  • name and version of MPI: none

Additional context
First seen by @s-sajid-ali.

https://github.com/fnalacceleratormodeling/synergia2/blob/231d3dff97c0a2bb64db49584c626ec15f7b24b4/src/analysis_tools/diag_plot_openpmd.py

@ax3l
Copy link
Member Author

ax3l commented Apr 5, 2023

Work-around is to use the traditional API:

series = io.Series(filename, io.Access_Type.read_only)
# ...

for k_i, i in series.iterations.items():
    pass
for k_i, i in series.iterations.items():
    pass

@ax3l ax3l added this to the 0.15.2 milestone Apr 5, 2023
@franzpoeschel
Copy link
Contributor

I would say that this currently has the status of a feature request, rather than a bug ;) If anything, it was a bug that this workflow did somehow function in 0.14.
Series::readIterations() is currently intended for workflows that would also be usable in streaming.
Doing for it in series.read_iterations() is not a light operation, it goes through the different IO steps in the backend.
For lightweight access such as reading attributes, for k_i, i in series.iterations.items() is not a workaround, but the better choice of API.
Supporting the workflow of calling read_iterations() multiple times is one of my goals for the 0.16 release cycle, but it will have the character of an API addition and will require new internal workflows and additions in the backend, rather than a quick adaptation in the public API.

@ax3l
Copy link
Member Author

ax3l commented Apr 5, 2023

@s-sajid-ali just checking: I remember you wrote the file with HDF5. When you wrote the file, did you use for iterations the groupBased encoding, the fileBased encoding?

@ax3l ax3l modified the milestones: 0.15.2, 0.16.0 Apr 5, 2023
@ax3l
Copy link
Member Author

ax3l commented Apr 5, 2023

@s-sajid-ali just checking where you found read_iterations in the docs/examples - we just want to make sure we don't accidentally advertise it outside of streaming (read-once) workflows yet :)

@ax3l ax3l changed the title 0.15: read_iterations twice Iterate readIterations/read_iterations Multiple Times Apr 5, 2023
@s-sajid-ali
Copy link
Member

.. I remember you wrote the file with HDF5. When you wrote the file, did you use for iterations the groupBased encoding, the fileBased encoding?

I used groupBased encoding:

sasyed@MAC-140753 ~/D/p/s/b/e/fodo_cxx (sajid/openpmd_python_api_fixes)> h5glance diag.h5                                                                               
diag.h5 (10 attributes)
└data
  ├0 (20 attributes)
  ├1 (20 attributes)
  ├2 (20 attributes)
  ├3 (20 attributes)
  ├4 (20 attributes)
  └5 (20 attributes)

sasyed@MAC-140753 ~/D/p/s/b/e/fodo_cxx (sajid/openpmd_python_api_fixes)>  

... where you found read_iterations in the docs/examples - we just want to make sure we don't accidentally advertise it outside of streaming (read-once) workflows yet :)

Likely from this example: https://openpmd-api.readthedocs.io/en/0.15.1/usage/parallel.html#id2 or from inspecting the available methods for a Series object in a Jupyter notebook and realizing that read_iterations worked for the use case I had (at least with openpmd-api@:0.15.0).

@ax3l
Copy link
Member Author

ax3l commented Apr 10, 2023

I see, yes the comment

    # In parallel contexts, it's important to explicitly open iterations.
    # This is done automatically when using `Series.write_iterations()`,
    # or in read mode `Series.read_iterations()`.

is misleading, we need to update that.

@franzpoeschel
Copy link
Contributor

#1592 brings a first step in this direction. It supports re-opening closed Iterations and going back to earlier Iterations in Series.readIterations(). This is currently restricted to Series which don't use ADIOS2 steps since those will require closing and reopening in the backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants