-
Notifications
You must be signed in to change notification settings - Fork 5
Model split perf #1686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Model split perf #1686
Conversation
In a nutshell: make sure DIS is in RAM, avoid re-loading data over and over, avoid unnecessary work. Refactoring model splitter: Create a class to re-use data where possible. * DO NOT PURGE AFTER PARTITION MODEL CREATION. If data is spatially unchunked, this forces loading the entire dataset into memory. This means performance degrades linearly with the number of partitions (since each partition requires a separate load) unless all data is loaded into memory at start (not feasible for large models). * This implementation runs a groupby instead on the unpartitioned data (i.e. a single load) and counts the number of active elements. If a package has zero active elements, it is omitted. Some dispatching for point and line data (matches clipping logic). * Avoid looking for paired transport models; mapping is fully known a priori due to same partioning labels; keep them together in a NamedTuple (with some helpers). * Added some trailing returns (matter of taste) * Seems to increase performance from 2 hours (Joost) to 1 minute (me locally). The performance unfortunately still gets worse with each partition when writing the model. In principle, chunking fixes this (but requires data locality of unstructured meshes, can be done, see xugrid issues); alternatively, we might get dask to optimize writing. Needs research.
This script will re-order the data to a locality preserving order, and them dump everything to TOML + Zarr: # %%
import numpy as np
import imod
import xarray as xr
# %%
def morton_encode_2d(x, y):
"""
Encode 2D coordinates to Morton codes.
Uses 32 bits per dimension for 64-bit Morton codes.
Parameters
----------
x: np.ndarray of floats
y: np.ndarray of floats
Returns
-------
codes: np.ndarray
Morton codes as uint64
"""
x = np.asarray(x, dtype=np.float64)
y = np.asarray(y, dtype=np.float64)
min_x, max_x = np.min(x), np.max(x)
min_y, max_y = np.min(y), np.max(y)
# Add small padding to avoid edge cases
x_range = max_x - min_x
y_range = max_y - min_y
padding = 1e-12 # Smaller padding for maximum resolution
if x_range == 0:
x_range = 1.0
padding = 0.5
if y_range == 0:
y_range = 1.0
padding = 0.5
min_x -= x_range * padding
max_x += x_range * padding
min_y -= y_range * padding
max_y += y_range * padding
# Scale coordinates to [0, 2^32 - 1] for maximum resolution
max_coord = 0xFFFFFFFF # 2^32 - 1
x_scaled = ((x - min_x) / (max_x - min_x) * max_coord).astype(np.uint64)
y_scaled = ((y - min_y) / (max_y - min_y) * max_coord).astype(np.uint64)
# Clamp to valid range
x_scaled = np.clip(x_scaled, 0, max_coord)
y_scaled = np.clip(y_scaled, 0, max_coord)
# Dilate bits for 32-bit values - maximum resolution bit interleaving
def dilate_bits_32(vals):
"""Dilate 32-bit values by inserting zeros between bits"""
vals = vals & 0xFFFFFFFF # Ensure 32-bit
vals = (vals | (vals << 32)) & 0x00000000FFFFFFFF
vals = (vals | (vals << 16)) & 0x0000FFFF0000FFFF
vals = (vals | (vals << 8)) & 0x00FF00FF00FF00FF
vals = (vals | (vals << 4)) & 0x0F0F0F0F0F0F0F0F
vals = (vals | (vals << 2)) & 0x3333333333333333
vals = (vals | (vals << 1)) & 0x5555555555555555
return vals
# Dilate and interleave: x gets odd positions, y gets even positions
x_dilated = dilate_bits_32(x_scaled)
y_dilated = dilate_bits_32(y_scaled)
# Interleave by shifting y left by 1 and OR-ing with x
morton_codes = x_dilated | (y_dilated << 1)
return morton_codes
def reorder_morton(data):
grid = data.ugrid.grid
x, y = grid.centroids.T
codes = morton_encode_2d(x, y)
order = np.argsort(codes)
return data.isel({grid.face_dimension: order})
def reorder_and_rechunk_simulation(
sim: imod.mf6.Modflow6Simulation,
dimension: str = "mesh2d_nFaces",
chunks: dict[str, int]=None,
):
if chunks is None:
chunks = {"time": 10, dimension: 20_000}
for model in sim.values():
if isinstance(model, imod.mf6.model.Modflow6Model):
for package in model.values():
dims = package.dataset.dims
if dimension in dims:
package_chunks = {k: v for k, v in chunks.items() if k in dims}
package.dataset = reorder_morton(package.dataset)
# Set encoding chunks which will end up in the zarr file.
if package_chunks:
package.dataset.encoding = {"chunks": package_chunks}
return
# %%
path = "Z:/tmp/joost/v5_composiet_v1b_v2_nodsp_aut/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
reorder_and_rechunk_simulation(sim)
sim.dump("rechunked_zarrsim", validate=False, engine="zarr")
|
Annoyingly, I get an error inside the function calls that # %%
import imod
# %%
def load_non_spatial_data(sim, dimension: str = "mesh2d_nFaces"):
for model in sim.values():
if isinstance(model, imod.mf6.model.Modflow6Model):
for package in model.values():
if dimension not in package.dataset.dims:
package.dataset.load()
for var in package.dataset.data_vars:
if dimension not in package.dataset[var].dims:
package.dataset[var].load()
else:
model.dataset.load()
return
# %%
path = "rechunked-simulation/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
load_non_spatial_data(sim)
sim._validation_context = imod.mf6.ValidationSettings(strict_well_validation=False, ignore_time=True)
# %%
# Serial test
# sim.write("serial_simulation", validate=False)
submodel_labels = sim.create_partition_labels(npartitions=8)
sim_partitions = sim.split(submodel_labels, ignore_time_purge_empty=True)
sim_partitions._validation_context = imod.mf6.ValidationSettings(strict_well_validation=False, ignore_time=True)
sim_partitions.write("partitioned_simulation", validate=False)
# %% For future to do's: the morton curve ordering should become a part of xugrid. See: Deltares/xugrid#389 It may make sense to wrap the ordering on the simulation levels, as a method instead of a free function. |
|
Some general thoughts (I've had a little time to mull it over); we a minior optimization in the Secondly, in the get_unfiltered_options; we should probably use |
For @jdelsman
Sample script:
The
sim.split()
takes about 70 seconds here, which seems reasonable (compared to 2 hours before?).