Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move store backing calculation out of dataset writer #441

Merged
merged 6 commits into from
Sep 12, 2024

Conversation

yousefmoazzam
Copy link
Collaborator

@yousefmoazzam yousefmoazzam commented Sep 12, 2024

This contains some changes that are a preliminary to integrating padding into the task runner.

See the message of 519dc3a for details on the motivation for moving the store-backing calculation out of the writer. A short summary would be:

  • it's difficult to account for the two chunk copies in the store-backing calculation if it's kept in the writer
  • it feels more explicit/less hidden where the store-backing calculation occurs now that is has been moved out of the writer's write_block() method

Main changes:

  • add DataSetStoreBacking enum for describing the backing of the dataset store
  • move calculation of store-backing out of writer's write_block() method
  • add function for calculating the backing of the dataset store + tests

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have made corresponding changes to the documentation

For any section that is not the last section, approximately two chunk
copies need to be accounted for:
- the unpadded chunk that the writer of section `n` creates
- the padded chunk that the reader of section `n+1` creates

For the last section in a pipeline, only one chunk copy needs to be
accounted for: the unpadded chunk that the writer of that section
creates.
Prior to this change, the decision of whether the dataset store was
backed by RAM or an hdf5 file was made by the writer. Specifically, the
writer was deciding whether the store should be backed by RAM or an hdf5
file when the first block was written to the store.

This made a lot of sense when the determining of the store-backing was
done by:
- attempting to create a numpy array with the necessary chunk shape
  (thus allocating the required memory)
- potentially catching a `MemoryError` being raised by python

However, some changes were made in #218 to accommodate the fact that,
running under SLURM, the job would be OOM killed by SLURM without giving
python a chance to raise a `MemoryError`. Among these changes were that,
instead of allocating a numpy array and then catching if a `MemoryError`
was raised, the required number of bytes was calculated to determine if
the numpy array should be created or not. This essentially made the
determining of the store backing be an operation purely involving some
simple arithmetic.

The arithmetic could conceivably be done outside of the writer (ie,
prior to the first block being written to the store). One benefit of
this would be that the decision of the store backing wouldn't be so
hidden as to where it was happening.

Furthermore, the requirement of needing to account for two copies of the
chunk in memory (see the following comment and the linked thread in the
comment for more details
#401 (comment)),
was difficult to fulfil if the writer was the object deciding what the
backing of the store should be (ie, getting the chunk information about
the next section, to the writer of the current section, appeared to be
tricky).

With the above in mind, the calculation of if the store should be backed
by RAM or an hdf5 file has been moved out of the writer, and is now
performed by the task runner. The runner determines the backing of the
store, and passes that information to the writer's constructor. The
writer now simply uses whichever backing it is told to use by the
runner.
@yousefmoazzam yousefmoazzam merged commit 836d1b6 into main Sep 12, 2024
5 of 6 checks passed
@yousefmoazzam yousefmoazzam deleted the move-store-backing-calculation branch September 12, 2024 11:30
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant