Prior to this change, the decision of whether the dataset store was
backed by RAM or an hdf5 file was made by the writer. Specifically, the
writer was deciding whether the store should be backed by RAM or an hdf5
file when the first block was written to the store.
This made a lot of sense when the determining of the store-backing was
done by:
- attempting to create a numpy array with the necessary chunk shape
(thus allocating the required memory)
- potentially catching a `MemoryError` being raised by python
However, some changes were made in #218 to accommodate the fact that,
running under SLURM, the job would be OOM killed by SLURM without giving
python a chance to raise a `MemoryError`. Among these changes were that,
instead of allocating a numpy array and then catching if a `MemoryError`
was raised, the required number of bytes was calculated to determine if
the numpy array should be created or not. This essentially made the
determining of the store backing be an operation purely involving some
simple arithmetic.
The arithmetic could conceivably be done outside of the writer (ie,
prior to the first block being written to the store). One benefit of
this would be that the decision of the store backing wouldn't be so
hidden as to where it was happening.
Furthermore, the requirement of needing to account for two copies of the
chunk in memory (see the following comment and the linked thread in the
comment for more details
#401 (comment)),
was difficult to fulfil if the writer was the object deciding what the
backing of the store should be (ie, getting the chunk information about
the next section, to the writer of the current section, appeared to be
tricky).
With the above in mind, the calculation of if the store should be backed
by RAM or an hdf5 file has been moved out of the writer, and is now
performed by the task runner. The runner determines the backing of the
store, and passes that information to the writer's constructor. The
writer now simply uses whichever backing it is told to use by the
runner.