Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
jo-mueller committed Aug 30, 2023
2 parents c015a93 + c97ec1c commit 456ba85
Show file tree
Hide file tree
Showing 10 changed files with 2,646 additions and 1 deletion.
702 changes: 702 additions & 0 deletions docs/80_image_analysis_with_dask/1_dask_basics.ipynb

Large diffs are not rendered by default.

467 changes: 467 additions & 0 deletions docs/80_image_analysis_with_dask/2_dask_image.ipynb

Large diffs are not rendered by default.

363 changes: 363 additions & 0 deletions docs/80_image_analysis_with_dask/3_lazy_image_processing.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,363 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Practical 3: Virtual stack visualization and explorative analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When doing explorative analysis on large datasets, sometimes it's not an option to load a full dataset into memory. Still, one would want to browse images and potentially try out processing workflows in an interactive manner.\n",
"\n",
"In this notebook, **we'll build a simple lazy data viewer and interactively explore a large dataset**.\n",
"\n",
"We'll create the \"large\" example dataset synthetically, but another option would be to download e.g. the following dataset: http://data.celltrackingchallenge.net/training-datasets/Fluo-N3DL-TRIC.zip. In this case, "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"\n",
"import os, sys\n",
"import numpy as np\n",
"import tifffile\n",
"from scipy import ndimage\n",
"from tqdm import tqdm\n",
"\n",
"import dask\n",
"import dask.array as da\n",
"from dask_image import ndfilters\n",
"from dask import delayed\n",
"\n",
"%matplotlib notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# For improving the usability of this notebook,\n",
"# let's create an example 3D+T dataset instead of downloading one.\n",
"\n",
"# create first timepoint\n",
"N_xy, N_z, N_t, N_rs, dx = 1000, 100, 10, [30, 5], 10\n",
"\n",
"np.random.seed(0)\n",
"img = np.product([\n",
" ndimage.zoom(np.random.random([N_r] * 3),\n",
" zoom=[N_z/N_r, N_xy/N_r, (N_xy + 2 * N_t * dx)/N_r], order=1)\n",
" for N_r in N_rs], axis=0,\n",
")\n",
"\n",
"# convert into uint16\n",
"img = (img * 10000).astype(np.uint16)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# save it as a timelapse\n",
"\n",
"file_pattern = 'data/large_3d_dataset_tp%03d.tif'\n",
"\n",
"os.makedirs('data', exist_ok=True)\n",
"\n",
"N_t = 20\n",
"for t in tqdm(range(N_t)):\n",
" curr_tp = img[:, :, dx * t: dx * t + N_xy]\n",
" tifffile.imwrite(file_pattern %t, curr_tp)\n",
"\n",
"print('Total dataset size: %.2f GB' %(N_t * N_xy ** 2 * N_z * 2 / 10**9))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Loading the dataset into a dask array\n",
"\n",
"In order to lazily access images, we construct a dask array containing chunks that are computed by reading the corresponding image data from disk."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from glob import glob\n",
"import zarr\n",
"\n",
"file_pattern = 'data/large_3d_dataset_tp*.tif'\n",
"files = sorted(glob(file_pattern))\n",
"\n",
"# determine the shape and dtype of the data\n",
"zarr_arr = zarr.open(tifffile.imread(file_pattern, aszarr=True))\n",
"\n",
"N_t, N_z, N_x, N_y = zarr_arr.shape\n",
"dtype = zarr_arr.dtype\n",
"\n",
"print('Total dataset size is %s GB'\n",
" %(np.product([N_t, N_z, N_x, N_y, dtype.itemsize]) / 1e9))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# define a custom reader function\n",
"# which loads a single 2D frame from a 3d tif file\n",
"\n",
"def load_2d(t, z):\n",
" return tifffile.TiffFile(files[t]).pages[z].asarray()\n",
"\n",
"# loading should be lazy\n",
"load_2d = delayed(load_2d)\n",
"\n",
"# manually compose a dask array from the individual lazily loaded frames\n",
"# `da.from_delayed` converts a delayed object into a dask array, given\n",
"# information about the shape and dtype of the delayed result\n",
"ims = da.stack([\n",
" da.stack([\n",
" da.from_delayed(load_2d(t, z),\n",
" shape=(N_x, N_y),\n",
" dtype=dtype)\n",
" for z in range(N_z)])\n",
" for t in range(N_t)])\n",
"\n",
"ims"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize the dataset\n",
"\n",
"Since dask arrays essentially behave like numpy arrays, many viewers support the visualization of the previously constructed dask array."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using tifffile"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# tifffile contains a multidimensional image viewer based on matplotlib\n",
"\n",
"tifffile.imshow(ims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using napari\n",
"Napari supports dask arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import napari\n",
"\n",
"viewer = napari.Viewer()\n",
"\n",
"viewer.add_image(ims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using ipywidgets to interact with matplotlib plots"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib notebook\n",
"import matplotlib.pyplot as plt\n",
"from ipywidgets import interact\n",
"\n",
"# a simple multi-dimensional image viewer\n",
"def browse_images(ims, show_colorbar=False):\n",
"\n",
" plt.figure()\n",
" \n",
" # determine the shape of the non spatial dimensions\n",
" scroll_shape = ims.shape[:-2]\n",
"\n",
" def view_image(**kwargs):\n",
" pos = tuple(kwargs[dim] for dim in sorted(kwargs))\n",
" plt.imshow(ims[tuple(pos)].T, cmap=plt.cm.gray_r, interpolation='nearest')\n",
" if show_colorbar:\n",
" plt.colorbar()\n",
"\n",
" # interact with the viewer using the non spatial dimensions\n",
" interact(view_image,\n",
" **{'dim %s' %dim: (0, s-1) for dim, s in enumerate(scroll_shape)})\n",
"\n",
" plt.show()\n",
" \n",
"browse_images(ims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Process the image on the fly\n",
"\n",
"In addition to viewing the raw images, we can perform operations on the array before viewing it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# max projection as a simple example\n",
"\n",
"browse_images(ims.max(-3))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# another example: local background subtraction\n",
"\n",
"ims_mod = ims.astype(np.int32) - ndfilters.minimum_filter(ims, size=(1, 1, 30, 30))\n",
"ims_mod = np.clip(ims_mod, 0, 2**16 - 1)\n",
"\n",
"browse_images(ims_mod, show_colorbar=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# another workflow\n",
"\n",
"ims_max = ims.max(1)\n",
"ims_max = ims_max.rechunk((1, 600, 600))\n",
"ims_proc = ndfilters.gaussian_filter(ims_max, (0, 2, 2))\n",
"ims_proc = ims_proc.astype(float) - ndfilters.minimum_filter(ims_proc, (1, 50, 50))\n",
"ims_proc = ims_proc / ndfilters.maximum_filter(ims_proc.rechunk((1, 600, 600)), (1, 50, 50))\n",
"ims_proc = ims_proc > 0.5\n",
"\n",
"browse_images(ims_proc)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# once we're happy with the workflow, we can compute the result\n",
"# and stream it into a file\n",
"\n",
"from dask import diagnostics\n",
"\n",
"with diagnostics.ProgressBar():\n",
" da.to_zarr(ims_proc, 'data/processed.zarr', overwrite=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Obtain properties from objects"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy import ndimage\n",
"from skimage import measure\n",
"import pandas as pd\n",
"\n",
"def get_object_properties(im_binary, im_intensities, t):\n",
" labels, _ = ndimage.label(im_binary)\n",
" props = measure.regionprops_table(\n",
" labels,\n",
" intensity_image=im_intensities,\n",
" properties=['label', 'centroid', 'area', 'mean_intensity'])\n",
" props = pd.DataFrame(props)\n",
" props['t'] = t\n",
" return props\n",
"\n",
"dfs = []\n",
"for t, im in enumerate(ims_proc[:3]):\n",
" df = delayed(get_object_properties)(im, ims_max[t], t)\n",
" dfs.append(df)\n",
"\n",
"with dask.diagnostics.ProgressBar():\n",
" df = pd.concat(dask.compute(dfs)[0], ignore_index=True)\n",
"\n",
"df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
28 changes: 28 additions & 0 deletions docs/80_image_analysis_with_dask/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Lazy and parallel bio-image processing using DASK

When images take too long to load or process, or don’t fit into memory, they can be split up and managed in small parts. This course is about easily gaining control over which parts of an image are processed when and where in the context of interactive python workflows, napari and cluster computing.

To get started, we need to install/activate a suitable conda environment:

Option 1:

Use the environment created in the [course preparation](
https://biapol.github.io/PoL-BioImage-Analysis-TS-GPU-Accelerated-Image-Analysis/00_course_preparation/Readme.html) and install some further packages to it:

```
mamba activate devbio-napari-env
mamba install -c conda-forge dask-image ipycytoscape
```

Option 2:

Create a new environment from scratch:

```
mamba create --name <dask_course> python=3.9 devbio-napari pyqt dask-image ipycytoscape -c conda-forge
mamba activate dask_course
```

## Lecture materials

Will follow.
Loading

0 comments on commit 456ba85

Please sign in to comment.