Model split perf #1686

Huite · 2025-09-24T19:16:17Z

Sample script:

# %%

import imod

# %%
path = "Z:/tmp/joost/v5_composiet_v1b_v2_nodsp_aut/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
submodel_labels = sim.create_partition_labels(npartitions=24)

# %%

sim_partitions = sim.split(submodel_labels, ignore_time_purge_empty=True)

The sim.split() takes about 70 seconds here, which seems reasonable (compared to 2 hours before?).

Relates to #1681 and #1682

In a nutshell: make sure DIS is in RAM, avoid re-loading data over and over, avoid unnecessary work. Refactoring model splitter: Create a class to re-use data where possible. * DO NOT PURGE AFTER PARTITION MODEL CREATION. If data is spatially unchunked, this forces loading the entire dataset into memory. This means performance degrades linearly with the number of partitions (since each partition requires a separate load) unless all data is loaded into memory at start (not feasible for large models). * This implementation runs a groupby instead on the unpartitioned data (i.e. a single load) and counts the number of active elements. If a package has zero active elements, it is omitted. Some dispatching for point and line data (matches clipping logic). * Avoid looking for paired transport models; mapping is fully known a priori due to same partioning labels; keep them together in a NamedTuple (with some helpers). * Added some trailing returns (matter of taste) * Seems to increase performance from 2 hours (Joost) to 1 minute (me locally). The performance unfortunately still gets worse with each partition when writing the model. In principle, chunking fixes this (but requires data locality of unstructured meshes, can be done, see xugrid issues); alternatively, we might get dask to optimize writing. Needs research.

…ump methods

Huite · 2025-09-26T18:32:00Z

This script will re-order the data to a locality preserving order, and them dump everything to TOML + Zarr:

# %%

import numpy as np
import imod
import xarray as xr

# %%

def morton_encode_2d(x, y):
    """
    Encode 2D coordinates to Morton codes.
    Uses 32 bits per dimension for 64-bit Morton codes.

    Parameters
    ----------
    x: np.ndarray of floats
    y: np.ndarray of floats

    Returns
    -------
    codes: np.ndarray
        Morton codes as uint64
    """
    x = np.asarray(x, dtype=np.float64)
    y = np.asarray(y, dtype=np.float64)

    min_x, max_x = np.min(x), np.max(x)
    min_y, max_y = np.min(y), np.max(y)

    # Add small padding to avoid edge cases
    x_range = max_x - min_x
    y_range = max_y - min_y
    padding = 1e-12  # Smaller padding for maximum resolution

    if x_range == 0:
        x_range = 1.0
        padding = 0.5
    if y_range == 0:
        y_range = 1.0
        padding = 0.5

    min_x -= x_range * padding
    max_x += x_range * padding
    min_y -= y_range * padding
    max_y += y_range * padding

    # Scale coordinates to [0, 2^32 - 1] for maximum resolution
    max_coord = 0xFFFFFFFF  # 2^32 - 1

    x_scaled = ((x - min_x) / (max_x - min_x) * max_coord).astype(np.uint64)
    y_scaled = ((y - min_y) / (max_y - min_y) * max_coord).astype(np.uint64)

    # Clamp to valid range
    x_scaled = np.clip(x_scaled, 0, max_coord)
    y_scaled = np.clip(y_scaled, 0, max_coord)

    # Dilate bits for 32-bit values - maximum resolution bit interleaving
    def dilate_bits_32(vals):
        """Dilate 32-bit values by inserting zeros between bits"""
        vals = vals & 0xFFFFFFFF  # Ensure 32-bit
        vals = (vals | (vals << 32)) & 0x00000000FFFFFFFF
        vals = (vals | (vals << 16)) & 0x0000FFFF0000FFFF
        vals = (vals | (vals << 8)) & 0x00FF00FF00FF00FF
        vals = (vals | (vals << 4)) & 0x0F0F0F0F0F0F0F0F
        vals = (vals | (vals << 2)) & 0x3333333333333333
        vals = (vals | (vals << 1)) & 0x5555555555555555
        return vals

    # Dilate and interleave: x gets odd positions, y gets even positions
    x_dilated = dilate_bits_32(x_scaled)
    y_dilated = dilate_bits_32(y_scaled)

    # Interleave by shifting y left by 1 and OR-ing with x
    morton_codes = x_dilated | (y_dilated << 1)

    return morton_codes


def reorder_morton(data):
    grid = data.ugrid.grid
    x, y = grid.centroids.T
    codes = morton_encode_2d(x, y)
    order = np.argsort(codes)
    return data.isel({grid.face_dimension: order})


def reorder_and_rechunk_simulation(
    sim: imod.mf6.Modflow6Simulation,
    dimension: str = "mesh2d_nFaces",
    chunks: dict[str, int]=None,
):
    if chunks is None:
        chunks = {"time": 10, dimension: 20_000}

    for model in sim.values():
        if isinstance(model, imod.mf6.model.Modflow6Model):
            for package in model.values():
                dims = package.dataset.dims
                if dimension in dims:
                    package_chunks = {k: v for k, v in chunks.items() if k in dims}
                    package.dataset = reorder_morton(package.dataset)
                    # Set encoding chunks which will end up in the zarr file.
                    if package_chunks:
                        package.dataset.encoding = {"chunks": package_chunks}

    return



# %%

path = "Z:/tmp/joost/v5_composiet_v1b_v2_nodsp_aut/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
reorder_and_rechunk_simulation(sim)
sim.dump("rechunked_zarrsim", validate=False, engine="zarr")

Huite · 2025-09-26T19:20:48Z

Annoyingly, I get an error inside the function calls that .item() isn't supported on dask arrays. Though when I access the variable through the REPL it seems to work fine. (EDIT: the reason this works is almost certainly because the REPL calls the __repr__ method which forces a compute somewhere.) Also, it becomes unhappy when it comes across the well package. Anyway, we can sidestep this by forcing all non spatial data into memory:

# %%
import imod

# %%

def load_non_spatial_data(sim, dimension: str = "mesh2d_nFaces"):
    for model in sim.values():
        if isinstance(model, imod.mf6.model.Modflow6Model):
            for package in model.values():
                if dimension not in package.dataset.dims:
                    package.dataset.load()
                for var in package.dataset.data_vars:
                    if dimension not in package.dataset[var].dims:
                        package.dataset[var].load()
        else:
            model.dataset.load()
    return

# %%

path = "rechunked-simulation/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
load_non_spatial_data(sim)
sim._validation_context = imod.mf6.ValidationSettings(strict_well_validation=False, ignore_time=True)

# %%

# Serial test
# sim.write("serial_simulation", validate=False)

submodel_labels = sim.create_partition_labels(npartitions=8)
sim_partitions = sim.split(submodel_labels, ignore_time_purge_empty=True)

sim_partitions._validation_context = imod.mf6.ValidationSettings(strict_well_validation=False, ignore_time=True)
sim_partitions.write("partitioned_simulation", validate=False)
# %%

For future to do's: the morton curve ordering should become a part of xugrid. See: Deltares/xugrid#389

It may make sense to wrap the ordering on the simulation levels, as a method instead of a free function.
We should investigate the zarr stuff; I'm running into an issue here where I cannot re-open a zip store of the recharge while the other packages work fine; the only reason I can think of now is the size. We also need to investigate the .item() thing which seems related to zarr as well -- I think zarr causes more aggressive chunking, whereas the netCDF engine might treat scalars somewhat differently? Not sure.

sonarqubecloud · 2025-09-27T11:13:21Z

Quality Gate passed

Issues
12 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Huite · 2025-09-27T14:37:18Z

Some general thoughts (I've had a little time to mull it over); we a minior optimization in the from_file method, which is to simply load all variables that are non-spatial. For these variables, having it dask backed will only increase overhead. This'll also solve the .item() issue.

Secondly, in the get_unfiltered_options; we should probably use is_scalar as a predicate.

Huite added 2 commits September 24, 2025 21:05

Exclude auxiliary data correctly from options.

0d5ebc4

Relates to #1681 and #1682

Huite marked this pull request as draft September 24, 2025 19:16

Huite added 4 commits September 24, 2025 21:30

Restore _generate_gwfgwt_exchanges (does not use partition_models)

6a204f0

Simplify (and fix) solution modelnames handling

c7037e3

Try-except scalar .item() in get_unfiltered_pkg_options

15ef01f

Add to_zarr on packages, add engine keyword in model and simulation d…

b791736

…ump methods

Check if zarr already exists; and delete it if it does

1a05316

This was referenced Oct 1, 2025

Modflow6Simulation.from_file() or even Package instantiation could always eagerly materialize all non-spatial data #1689

Open

[FEATURE] - Modflow6Simulation.split: warn on spatially unchunked input #1691

Open

FransRoelofsen added this to iMOD Suite Oct 15, 2025

FransRoelofsen moved this to 🏗 In Progress in iMOD Suite Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model split perf #1686

Model split perf #1686

Uh oh!

Huite commented Sep 24, 2025

Uh oh!

Huite commented Sep 26, 2025 •

edited

Loading

Uh oh!

Huite commented Sep 26, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Sep 27, 2025

Uh oh!

Huite commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Model split perf #1686

Are you sure you want to change the base?

Model split perf #1686

Uh oh!

Conversation

Huite commented Sep 24, 2025

Uh oh!

Huite commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Huite commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Sep 27, 2025

Quality Gate passed

Uh oh!

Huite commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Huite commented Sep 26, 2025 •

edited

Loading

Huite commented Sep 26, 2025 •

edited

Loading