Skip to content

Commit a052695

Browse files
Merge pull request #1892 from OceanParcels/removing_explicit_field_chunking
Removing explicit field chunking
2 parents 4d31dbb + ced2559 commit a052695

16 files changed

+54
-1230
lines changed

docs/documentation/additional_examples.rst

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,6 @@ example_brownian.py
88
:language: python
99
:linenos:
1010

11-
example_dask_chunk_OCMs.py
12-
--------------------------
13-
14-
.. literalinclude:: ../examples/example_dask_chunk_OCMs.py
15-
:language: python
16-
:linenos:
17-
1811
example_decaying_moving_eddy.py
1912
-------------------------------
2013

docs/examples/documentation_MPI.ipynb

Lines changed: 1 addition & 232 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"cell_type": "markdown",
66
"metadata": {},
77
"source": [
8-
"# Parallelisation with MPI and Field chunking with dask\n"
8+
"# Parallelisation with MPI\n"
99
]
1010
},
1111
{
@@ -159,237 +159,6 @@
159159
"For small projects, the above instructions are sufficient. If your project is large, then it is helpful to combine the `proc*` directories into a single zarr dataset and to optimise the chunking for your analysis. What is \"large\"? If you find yourself running out of memory while doing your analysis, saving the results, or sorting the dataset, or if reading the data is taking longer than you can tolerate, your problem is \"large.\" Another rule of thumb is if the size of your output directory is 1/3 or more of the memory of your machine, your problem is large. Chunking and combining the `proc*` data in order to speed up analysis is discussed [in the documentation on runs with large output](https://docs.oceanparcels.org/en/latest/examples/documentation_LargeRunsOutput.html).\n"
160160
]
161161
},
162-
{
163-
"attachments": {},
164-
"cell_type": "markdown",
165-
"metadata": {},
166-
"source": [
167-
"## Chunking the FieldSet with dask\n"
168-
]
169-
},
170-
{
171-
"attachments": {},
172-
"cell_type": "markdown",
173-
"metadata": {},
174-
"source": [
175-
"The basic idea of field-chunking in Parcels is that we use the `dask` library to load only these regions of the `Field` that are occupied by `Particles`. The advantage is that if different MPI processors take care of Particles in different parts of the domain, each only needs to load a small section of the full `FieldSet` (although note that this load-balancing functionality is still in [development](#Future-developments:-load-balancing)). Furthermore, the field-chunking in principle makes the `indices` keyword superfluous, as Parcels will determine which part of the domain to load itself.\n"
176-
]
177-
},
178-
{
179-
"attachments": {},
180-
"cell_type": "markdown",
181-
"metadata": {},
182-
"source": [
183-
"The default behaviour is for `dask` to control the chunking, via a call to `da.from_array(data, chunks='auto')`. The chunksizes are then determined by the layout of the NetCDF files.\n",
184-
"\n",
185-
"However, in tests we have experienced that this may not necessarily be the most efficient chunking. Therefore, Parcels provides control over the chunksize via the `chunksize` keyword in `Field` creation, which requires a dictionary that sets the typical size of chunks for each dimension.\n",
186-
"\n",
187-
"It is strongly encouraged to explore what the best value for chunksize is for your experiment, which will depend on the `FieldSet`, the `ParticleSet` and the type of simulation (2D versus 3D). As a guidance, we have found that chunksizes in the zonal and meridional direction of approximately around 128 to 512 are typically most effective. The binning relates to the size of the model and its data size, so power-of-two values are advantageous but not required.\n"
188-
]
189-
},
190-
{
191-
"attachments": {},
192-
"cell_type": "markdown",
193-
"metadata": {},
194-
"source": [
195-
"The notebook below shows an example script to explore the scaling of the time taken for `pset.execute()` as a function of zonal and meridional `chunksize` for a dataset from the [Copernicus Marine Service](http://marine.copernicus.eu/) portal.\n"
196-
]
197-
},
198-
{
199-
"cell_type": "code",
200-
"execution_count": null,
201-
"metadata": {
202-
"pycharm": {
203-
"is_executing": true
204-
}
205-
},
206-
"outputs": [],
207-
"source": [
208-
"%pylab inline\n",
209-
"import os\n",
210-
"import time\n",
211-
"from datetime import timedelta\n",
212-
"from glob import glob\n",
213-
"\n",
214-
"import matplotlib.pyplot as plt\n",
215-
"import numpy as np\n",
216-
"import psutil\n",
217-
"\n",
218-
"import parcels\n",
219-
"\n",
220-
"\n",
221-
"def set_cms_fieldset(cs):\n",
222-
" data_dir_head = \"/data/oceanparcels/input_data\"\n",
223-
" data_dir = os.path.join(data_dir_head, \"CMS/GLOBAL_REANALYSIS_PHY_001_030/\")\n",
224-
" files = sorted(glob(data_dir + \"mercatorglorys12v1_gl12_mean_201607*.nc\"))\n",
225-
" variables = {\"U\": \"uo\", \"V\": \"vo\"}\n",
226-
" dimensions = {\"lon\": \"longitude\", \"lat\": \"latitude\", \"time\": \"time\"}\n",
227-
"\n",
228-
" if cs not in [\"auto\", False]:\n",
229-
" cs = {\"time\": (\"time\", 1), \"lat\": (\"latitude\", cs), \"lon\": (\"longitude\", cs)}\n",
230-
" return parcels.FieldSet.from_netcdf(files, variables, dimensions, chunksize=cs)\n",
231-
"\n",
232-
"\n",
233-
"func_time = []\n",
234-
"mem_used_GB = []\n",
235-
"chunksize = [128, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2610, \"auto\", False]\n",
236-
"for cs in chunksize:\n",
237-
" fieldset = set_cms_fieldset(cs)\n",
238-
" pset = parcels.ParticleSet(\n",
239-
" fieldset=fieldset,\n",
240-
" pclass=parcels.Particle,\n",
241-
" lon=[0],\n",
242-
" lat=[0],\n",
243-
" repeatdt=timedelta(hours=1),\n",
244-
" )\n",
245-
"\n",
246-
" tic = time.time()\n",
247-
" pset.execute(parcels.AdvectionRK4, dt=timedelta(hours=1))\n",
248-
" func_time.append(time.time() - tic)\n",
249-
" process = psutil.Process(os.getpid())\n",
250-
" mem_B_used = process.memory_info().rss\n",
251-
" mem_used_GB.append(mem_B_used / (1024 * 1024))"
252-
]
253-
},
254-
{
255-
"cell_type": "code",
256-
"execution_count": null,
257-
"metadata": {
258-
"pycharm": {
259-
"is_executing": true
260-
}
261-
},
262-
"outputs": [],
263-
"source": [
264-
"fig, ax = plt.subplots(1, 1, figsize=(15, 7))\n",
265-
"\n",
266-
"ax.plot(chunksize[:-2], func_time[:-2], \"o-\")\n",
267-
"ax.plot([0, 2800], [func_time[-2], func_time[-2]], \"--\", label=chunksize[-2])\n",
268-
"ax.plot([0, 2800], [func_time[-1], func_time[-1]], \"--\", label=chunksize[-1])\n",
269-
"plt.xlim([0, 2800])\n",
270-
"plt.legend()\n",
271-
"ax.set_xlabel(\"chunksize\")\n",
272-
"ax.set_ylabel(\"Time spent in pset.execute() [s]\")\n",
273-
"plt.show()"
274-
]
275-
},
276-
{
277-
"attachments": {},
278-
"cell_type": "markdown",
279-
"metadata": {},
280-
"source": [
281-
"The plot above shows that in this case, `chunksize='auto'` and `chunksize=False` (the two dashed lines) are roughly the same speed, but that the fastest run is for `chunksize=(1, 256, 256)`. But note that the actual thresholds and numbers depend on the `FieldSet` used and the specifics of your experiment.\n",
282-
"\n",
283-
"Furthermore, one of the major advantages of field chunking is the efficient utilization of memory. This permits the distribution of the particle advection to many cores, as otherwise the processing unit (e.g. a CPU core; a node in a cluster) would exhaust the memory rapidly. This is shown in the following plot of the memory behaviour.\n"
284-
]
285-
},
286-
{
287-
"cell_type": "code",
288-
"execution_count": null,
289-
"metadata": {
290-
"pycharm": {
291-
"is_executing": true
292-
}
293-
},
294-
"outputs": [],
295-
"source": [
296-
"fig, ax = plt.subplots(1, 1, figsize=(15, 12))\n",
297-
"ax.plot(chunksize[:-2], mem_used_GB[:-2], \"--\", label=\"memory_blocked [MB]\")\n",
298-
"ax.plot([0, 2800], [mem_used_GB[-2], mem_used_GB[-2]], \"x-\", label=\"auto [MB]\")\n",
299-
"ax.plot([0, 2800], [mem_used_GB[-1], mem_used_GB[-1]], \"--\", label=\"no chunking [MB]\")\n",
300-
"plt.legend()\n",
301-
"ax.set_xlabel(\"chunksize\")\n",
302-
"ax.set_ylabel(\"Memory blocked in pset.execute() [MB]\")\n",
303-
"plt.show()"
304-
]
305-
},
306-
{
307-
"attachments": {},
308-
"cell_type": "markdown",
309-
"metadata": {},
310-
"source": [
311-
"It can clearly be seen that with `chunksize=False` (green line), all Field data are loaded in full directly into memory, which can lead to MPI-run simulations being aborted for memory reasons. Furthermore, one can see that even the automatic method is not optimal, and optimizing the chunksize for a specific hydrodynamic dataset can make a large difference to the memory used.\n",
312-
"\n",
313-
"It may - depending on your simulation goal - be necessary to tweak the chunksize to leave more memory space for additional particles that are being simulated. Since particles and fields share the same memory space, lower memory utilisation by the `FieldSet` means more memory available for a larger `ParticleSet`.\n",
314-
"\n",
315-
"Also note that the above example is for a 2D application. For 3D applications, the `chunksize=False` will almost always be slower than `chunksize='auto'` or any dictionary, and is likely to run into insufficient memory issues, raising a `MemoryError`. The plot below shows the same analysis as above, but this time for a set of simulations using the full 3D Copernicus Marine Service code. In this case, the `chunksize='auto'` is about two orders of magnitude faster than running without chunking, and about 7.5 times faster than with minimal chunk capacity (i.e. `chunksize=(1, 128, 128)`).\n",
316-
"\n",
317-
"Choosing too small chunksizes can make the code slower, again highlighting that it is wise to explore which chunksize is best for your experiment before you perform it.\n"
318-
]
319-
},
320-
{
321-
"cell_type": "code",
322-
"execution_count": null,
323-
"metadata": {
324-
"pycharm": {
325-
"is_executing": true
326-
}
327-
},
328-
"outputs": [],
329-
"source": [
330-
"from parcels import AdvectionRK4_3D\n",
331-
"\n",
332-
"\n",
333-
"def set_cms_fieldset_3D(cs):\n",
334-
" data_dir_head = \"/data/oceanparcels/input_data\"\n",
335-
" data_dir = os.path.join(data_dir_head, \"CMS/GLOBAL_REANALYSIS_PHY_001_030/\")\n",
336-
" files = sorted(glob(data_dir + \"mercatorglorys12v1_gl12_mean_201607*.nc\"))\n",
337-
" variables = {\"U\": \"uo\", \"V\": \"vo\"}\n",
338-
" dimensions = {\n",
339-
" \"lon\": \"longitude\",\n",
340-
" \"lat\": \"latitude\",\n",
341-
" \"depth\": \"depth\",\n",
342-
" \"time\": \"time\",\n",
343-
" }\n",
344-
"\n",
345-
" if cs not in [\"auto\", False]:\n",
346-
" cs = {\n",
347-
" \"time\": (\"time\", 1),\n",
348-
" \"depth\": {\"depth\", 1},\n",
349-
" \"lat\": (\"latitude\", cs),\n",
350-
" \"lon\": (\"longitude\", cs),\n",
351-
" }\n",
352-
" return parcels.FieldSet.from_netcdf(files, variables, dimensions, chunksize=cs)\n",
353-
"\n",
354-
"\n",
355-
"chunksize_3D = [128, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2610, \"auto\", False]\n",
356-
"func_time3D = []\n",
357-
"for cs in chunksize_3D:\n",
358-
" fieldset = set_cms_fieldset_3D(cs)\n",
359-
" pset = parcels.ParticleSet(\n",
360-
" fieldset=fieldset,\n",
361-
" pclass=parcels.Particle,\n",
362-
" lon=[0],\n",
363-
" lat=[0],\n",
364-
" repeatdt=timedelta(hours=1),\n",
365-
" )\n",
366-
"\n",
367-
" tic = time.time()\n",
368-
" pset.execute(AdvectionRK4_3D, dt=timedelta(hours=1))\n",
369-
" func_time3D.append(time.time() - tic)"
370-
]
371-
},
372-
{
373-
"cell_type": "code",
374-
"execution_count": null,
375-
"metadata": {
376-
"pycharm": {
377-
"is_executing": true
378-
}
379-
},
380-
"outputs": [],
381-
"source": [
382-
"fig, ax = plt.subplots(1, 1, figsize=(15, 7))\n",
383-
"\n",
384-
"ax.plot(chunksize[:-2], func_time3D[:-2], \"o-\")\n",
385-
"ax.plot([0, 2800], [func_time3D[-2], func_time3D[-2]], \"--\", label=chunksize_3D[-2])\n",
386-
"plt.xlim([0, 2800])\n",
387-
"plt.legend()\n",
388-
"ax.set_xlabel(\"chunksize\")\n",
389-
"ax.set_ylabel(\"Time spent in pset.execute() [s]\")\n",
390-
"plt.show()"
391-
]
392-
},
393162
{
394163
"attachments": {},
395164
"cell_type": "markdown",

docs/examples/example_nemo_curvilinear.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,7 @@ def run_nemo_curvilinear(outfile, advtype="RK4"):
2828
}
2929
variables = {"U": "U", "V": "V"}
3030
dimensions = {"lon": "glamf", "lat": "gphif"}
31-
chunksize = {"lat": ("y", 256), "lon": ("x", 512)}
32-
fieldset = parcels.FieldSet.from_nemo(
33-
filenames, variables, dimensions, chunksize=chunksize
34-
)
35-
assert fieldset.U.chunksize == chunksize
31+
fieldset = parcels.FieldSet.from_nemo(filenames, variables, dimensions)
3632

3733
# Now run particles as normal
3834
npart = 20

docs/examples/example_ofam.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ def set_ofam_fieldset(deferred_load=True, use_xarray=False):
3333
dimensions,
3434
allow_time_extrapolation=True,
3535
deferred_load=deferred_load,
36-
chunksize=False,
3736
)
3837

3938

docs/examples/tutorial_parcels_structure.ipynb

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -418,26 +418,6 @@
418418
"- [Optimising the partitioning of the particles with a user-defined partition function](https://docs.oceanparcels.org/en/latest/examples/documentation_MPI.html#Optimising-the-partitioning-of-the-particles-with-a-user-defined-partition_function)\n",
419419
"- [Future developments: load balancing](https://docs.oceanparcels.org/en/latest/examples/documentation_MPI.html#Future-developments:-load-balancing)"
420420
]
421-
},
422-
{
423-
"cell_type": "markdown",
424-
"metadata": {},
425-
"source": [
426-
"Another good way to optimise Parcels and speed-up execution is to chunk the `FieldSet` with `dask`, using the `chunksize` argument in the `FieldSet` creation. This will allow Parcels to load the `FieldSet` in chunks. \n",
427-
"\n",
428-
"Using chunking can be especially useful when working with large datasets _and_ when the particles only occupy a small region of the domain.\n",
429-
"\n",
430-
"Note that the **default** is `chunksize=None`, which means that the `FieldSet` is loaded in its entirety. This is generally the most efficient way to load the `FieldSet` when the particles are spread out over the entire domain.\n"
431-
]
432-
},
433-
{
434-
"cell_type": "markdown",
435-
"metadata": {},
436-
"source": [
437-
"### For more tutorials chunking and dask:\n",
438-
"\n",
439-
"- [Chunking the FieldSet with dask](https://docs.oceanparcels.org/en/latest/examples/documentation_MPI.html#Chunking-the-FieldSet-with-dask)"
440-
]
441421
}
442422
],
443423
"metadata": {

parcels/_typing.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828
PathLike = str | os.PathLike
2929
Mesh = Literal["spherical", "flat"] # corresponds with `mesh`
3030
VectorType = Literal["3D", "3DSigma", "2D"] | None # corresponds with `vector_type`
31-
ChunkMode = Literal["auto", "specific", "failsafe"] # corresponds with `chunk_mode`
3231
GridIndexingType = Literal["pop", "mom5", "mitgcm", "nemo", "croco"] # corresponds with `gridindexingtype`
3332
UpdateStatus = Literal["not_updated", "first_updated", "updated"] # corresponds with `_update_status`
3433
NetcdfEngine = Literal["netcdf4", "xarray", "scipy"]

0 commit comments

Comments
 (0)