observing the chunk layout for an array #4082

d-v-b · 2026-06-18T15:38:39Z

d-v-b
Jun 18, 2026
Maintainer

Some recent changes (adding rectilinear chunks) and upcoming ones (lazy slicing?) are straining the APIs that tell users how an array is partitioned. I think this is really important information to get right, and we would benefit from thinking through the design, maybe in a discussion. hence this discussion. For background, we had a related discussion prior to the 3.x release.

here is a quick summary of our current situation:

Arrays have a chunks attribute, and a shards attribute. Neither chunks nor shards are array metadata fields in the v3 spec. We use these fields so that users could do create_array(chunks=(10,), shards=(20,)) to create a sensibly sharded array without threading the inner chunk shape through the codecs.

We kept chunks for backwards compatibility with zarr-python 2.x;
We chose chunks to denote "smallest readable unit" in this context to ensure that readers consuming zarr arrays (like dask) would pick the right granularity for reading by checking the chunks attribute.

Rectilinear chunking breaks the chunks attribute. The introduction of rectilinear chunking means chunks is not a plain tuple but potentially something large, as each individual chunk can have a unique shape. Rather than widen the type of this attribute, which might be a breaking change for consumers that expect tuple[int, ...], array.chunks raises a NotImplementedError when the rectilinear chunk grid is used:

import zarr
zarr.config.set({'array.rectilinear_chunks': True})

z = zarr.create_array(
    "memory:///foo",
    shape=(18,),
    shards=((10, 8),),
    chunks=(2,),
    dtype='uint8',
)

print(z.read_chunk_sizes)
# ((2, 2, 2, 2, 2, 2, 2, 2, 2),)
print(z.write_chunk_sizes)
# ((10, 8),)
print(z.chunks)
"""
NotImplementedError: The `chunks` attribute is only defined for arrays using regular chunk grids. This array has a rectilinear chunk grid. Use `read_chunk_sizes` for general access.
"""

With rectilinear chunking we got two new array attributes: read_chunk_sizes and write_chunk_sizes, which you can see in the code snippet above. But by focusing on abstract "read size" and "write size", these two attributes obscure important information about the array, like the actual layout of each chunk. The "read size" and "write size" is an instruction to the reader / writer about how the granularity of that operation, but an array user might also care about the stored layout of a chunk. for example, these two arrays have similar "read" and "write" sizes, but different physical chunks:

Array a
```
# 
import zarr
zarr.config.set({'array.rectilinear_chunks': True})
z = zarr.create_array(
    "memory:///foo",
    shape=(18,),
    chunks=([7,7,4],),
    dtype='uint8',
)

print(z.read_chunk_sizes)
# ((7, 7, 4),)
print(z.write_chunk_sizes)
# ((7, 7, 4),)
print(z.chunks)
# (7,)
```
Array B
```
import zarr
zarr.config.set({'array.rectilinear_chunks': True})

z = zarr.create_array(
    "memory:///foo",
    shape=(18,),
    chunks=(7,),
    dtype='uint8',
)

print(z.read_chunk_sizes)
# ((7, 7, 4),)
print(z.write_chunk_sizes)
# ((7, 7, 4),)
print(z.chunks)
# (7,)
```
In the above examples, array A uses the rectilinear chunk grid and so the stored chunks are sub-arrays with sizes (7,7,4). Array B uses the regular chunk grid, and its stored chunks are subarrays with sizes (7,7,7).

From an array indexing POV these two chunk grids behave identically, but they have different chunks, and I think we want to ensure that users can easily distinguish these two cases with methods or attributes on the Array class.

And here are some complications:

Lazy slicing will affect how we map a zarr.Array to stored chunks. Lazy slicing will create views of subsets of chunks. So for a lazily sliced array, we need to enumerate the projection of that array's selection on to the underlying chunks, which isn't the same as the size of the underlying chunks. That means for a lazy indexing operation like subset_2 = array.lazy[::2], subset_2.read_chunk_sizes isn't well defined as a collection of chunk sizes that sum to subset_2.shape, since subset_2 isn't defined from whole chunks.
The effect write / read granularity depends on more than just the presence / configuration of the sharding codec. We need to evaluate the read / write granularity over the entire codec pipeline, given the capabilities of a storage backend. Users can theoretically combine sharding with a bytes-bytes codec that negates the ability to read or write subchunks. It is not a good idea, but it's expressible in metadata. And on local storage, or memory storage, with no compression, individual scalars can be written if the store supports byte-range writes. Hopefully we get this feature in zarr-python soon! It's very important!

With all that said, @maxrjones has a PR that outlines a new chunk_layout data structure. I think this will help convey some of the information array consumers need. I'd also like to use this discussion as a venue to enumerate exactly what kind of information we think array consumers need, given the complexity in the array API.

Given some potentially lazy-sliced array A, I think users need easy access to the following info:

Which chunk files will I read / write if I collect the values of A? How big is each of these chunks? How is each chunk selected to produce A?
Which chunk files will I read / write if I write new values to A? (recall that partial chunk writes do a read first) How big is each of these chunks? How is each chunk selected when I write new values?
Which regions of A should I iterate over if I want to read 1 chunk per region? How big is the chunk I have to read, per region?
Which regions of A should I iterate over if I want to write 1 chunk per region? How big is that chunk (this is not necessarily the same size as the region!)?

Some questions for participants:

Did I miss anything in this enumeration? Are there any other chunked operations we need to support, given lazy slicing and the subtleties of a logical chunk size vs physical chunk size?
Should we revisit our use of the terms "shards" and "chunks"?
Does the proposal in poc: ChunkLayout for chunk and shard inspection #4040 get us there?
If not, what are we missing?

I'm especially keen to hear from zarr array consumers, @psobolewskiPhD .

maxrjones · 2026-06-25T22:51:32Z

maxrjones
Jun 25, 2026
Maintainer

Did I miss anything in this enumeration? Are there any other chunked operations we need to support, given lazy slicing and the subtleties of a logical chunk size vs physical chunk size? Does the proposal in #4040 get us there?

I think there are two broad sets of operations: operations that depend on the chunk layout (the discussion title) and operations that relate to both the chunk layout and an access pattern (the question at the end of the discussion). Here's an expanded framing based on this layering where complexity is mapped based on how much information needs to be provided with a question (e.g., nothing, a coordinate, a selection, or a batch of selections):

L0 - scalar facts about the grid (no input)
L1 - coordinate ↔ region maps (input: one chunk coordinate or point)
L2 - whole-grid iteration: enumerate every chunk or shard (no input; this is L1 applied across the whole grid)
L3 - alignment tests on a region (input: one region)
L4 - partition of a single selection (input: one selection)
L5 - partition of a batch of selections (input: a set of selections)

L0–L2 are access-independent (answerable from a static layout object); L3–L5 take an access and need a partitioner.

The Asked by column lists the consumers that need each answer and how they get it today: public (a public zarr API), private (reaches into zarr.core.*), self (reimplements it with no zarr API at all).

Access-independent — a layout object can answer these (pure description of the grid):

Level	Question	Why you need it	Asked by — API used
L0	What are the read-chunk sizes (uniform, or per-axis variable for a rectilinear grid)?	Lay out dask/task chunks so each task reads whole chunks; size a read buffer.	cubed, xarray, dask, anndata — public `.chunks`; cubed falls back to public `read_chunk_sizes` when `.chunks` raises on rectilinear
L0	What is the write unit (shard, else chunk)?	Pick the smallest region writable without clobbering a neighbor under concurrency; align before writing.	cubed, xarray, dask — public `.shards` ∥ `.chunks`
L0	Is the array sharded?	Branch logic — decide whether the write unit differs from the read unit at all.	cubed, xarray, dask — public (`hasattr` / `is not None`)
L0	Is the grid regular (uniform chunks)?	Branch to the fast aligned path, or fall back to handling variable per-axis sizes (the value is the read-chunk-sizes question).	dask — self (`_check_regular_chunks`); cubed — private (catches `.chunks` raise) ∥ #4040 `RegularChunkLayout`
L0	What is the grid shape (chunk count per dimension)?	Enumerate chunk coordinates and split work per axis — you cannot iterate coords from a total alone.	cubed (`ChunkKeys`), anndata (`_get_chunk_indices`) — self (could use public `cdata_shape`)
L0	How many chunks in total?	Progress bars, work/pool sizing, and resume checks (`nchunks_initialized != nchunks`).	cubed — public `nchunks`
L1	What array region does chunk `c` cover?	Write one chunk per block index; map a task's block to its slice.	zagg, cubed — self (`get_item` / block math)
L1	Which chunk holds index `i`?	Route a point/row to its chunk; group requested rows by chunk before reading.	annbatch — private (`chunk_grid` + `BasicIndexer`); zagg — self
L2	Iterate all chunks, one read region per chunk.	Copy/convert an array chunk-by-chunk without holding it in memory; emit one task per chunk.	anndata — public `iter_chunks()`; cubed — self (`ChunkKeys`)
L2	Iterate all write units, one region per shard.	Drive parallel writers, one worker per safely-writable region.	zagg — self (per-shard workers)

Access-parameterized — these require the partitioner (an algorithm run over a selection):

Level	Question	Why you need it	Asked by — API used
L3	Does this region align to write-unit boundaries?	Refuse or rechunk before a parallel write so workers don't clobber a shared shard.	dask, xarray (`validate_grid_chunks_alignment`), cubed, zagg — all self
L3	Does writing this region require a read-modify-write?	Decide whether a partial write must read the chunk first, and whether that is concurrency-safe (item 2).	xarray, dask, cubed — self / implicit
L4	Which chunks does this selection (read or write) touch?	Know which files/chunks to read or which tasks to spawn, without the within-chunk mapping (cubed extracts just `chunk_coords`; annbatch unions touched chunks).	cubed, annbatch — private `OrthogonalIndexer` / `BasicIndexer`
L4	For each touched chunk, the read projection `(chunk_selection → out_selection)`?	Gather an arbitrary slice/fancy index and place each chunk's contribution into the result (item 1).	cubed — private `OrthogonalIndexer`
L4	For each touched chunk, the write projection (which are partial → read-first, where each value lands)?	Scatter a write across chunks safely (item 2).	cubed — private `OrthogonalIndexer`; zagg — self (block math)
L5	For many disjoint selections: which chunks does the union touch, and how do I read each once?	Turn a shuffled mini-batch of scattered rows into the fewest chunk reads — L4 over a set, plus per-chunk coalescing.	annbatch — private `MultiBasicIndexer` (#3175)

Four things follow from the ladder:

L0–L2 give the same answer regardless of access, so a static layout object suffices. L3–L5 take an access as input, so no description can answer them. This answers your question "Does the proposal in poc: ChunkLayout for chunk and shard inspection #4040 get us there?" -> poc: ChunkLayout for chunk and shard inspection #4040 answers only questions that just depend on a static layout object. L3-L5 questions would require a different API surface.
Complexity tracks correctness risk, not just effort. L0–L2 are cheap and safe (tuples and divisions). The risk concentrates at L4, whose decomposition must match zarr's indexing semantics exactly. L4 must be zarr-owned rather than reimplemented per consumer (the silent-divergence point). The levels below it are mostly formalization of things consumers already get right.
L4 is the building block for L3 and L5 L3 is a cheap special case of L4 (only "is this selection a union of whole cells," not the full projection); L5 is L4 over a set plus a coalescing pass. So exposing L4 well makes L3 and L5 thin wrappers, which promotes prioritizing L4 and deriving the others rather than shipping three APIs.
One need sits off the ladder. Chunk existence state (nchunks_initialized, cubed's need) is not a complexity level of layout: it is the only need that touches the store (it lists keys) rather than being pure metadata + math. L0–L5 are all answerable offline from metadata; existence requires I/O, so it should be scoped separately from chunk layout questions.

1 reply

maxrjones Jun 25, 2026
Maintainer

#321 might also be a use-case. I'm not 100% sure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

observing the chunk layout for an array #4082

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Array a

Array B

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Uh oh!

observing the chunk layout for an array #4082

Uh oh!

Uh oh!

d-v-b Jun 18, 2026 Maintainer

Array a

Array B

Replies: 1 comment · 1 reply

Uh oh!

maxrjones Jun 25, 2026 Maintainer

Uh oh!

maxrjones Jun 25, 2026 Maintainer

d-v-b
Jun 18, 2026
Maintainer

Replies: 1 comment 1 reply

maxrjones
Jun 25, 2026
Maintainer

maxrjones Jun 25, 2026
Maintainer