-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of https://github.com/BiAPoL/PoL-BioImage-Analysi…
- Loading branch information
Showing
10 changed files
with
2,646 additions
and
1 deletion.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
363 changes: 363 additions & 0 deletions
363
docs/80_image_analysis_with_dask/3_lazy_image_processing.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,363 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Practical 3: Virtual stack visualization and explorative analysis" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"When doing explorative analysis on large datasets, sometimes it's not an option to load a full dataset into memory. Still, one would want to browse images and potentially try out processing workflows in an interactive manner.\n", | ||
"\n", | ||
"In this notebook, **we'll build a simple lazy data viewer and interactively explore a large dataset**.\n", | ||
"\n", | ||
"We'll create the \"large\" example dataset synthetically, but another option would be to download e.g. the following dataset: http://data.celltrackingchallenge.net/training-datasets/Fluo-N3DL-TRIC.zip. In this case, " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# imports\n", | ||
"\n", | ||
"import os, sys\n", | ||
"import numpy as np\n", | ||
"import tifffile\n", | ||
"from scipy import ndimage\n", | ||
"from tqdm import tqdm\n", | ||
"\n", | ||
"import dask\n", | ||
"import dask.array as da\n", | ||
"from dask_image import ndfilters\n", | ||
"from dask import delayed\n", | ||
"\n", | ||
"%matplotlib notebook" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# For improving the usability of this notebook,\n", | ||
"# let's create an example 3D+T dataset instead of downloading one.\n", | ||
"\n", | ||
"# create first timepoint\n", | ||
"N_xy, N_z, N_t, N_rs, dx = 1000, 100, 10, [30, 5], 10\n", | ||
"\n", | ||
"np.random.seed(0)\n", | ||
"img = np.product([\n", | ||
" ndimage.zoom(np.random.random([N_r] * 3),\n", | ||
" zoom=[N_z/N_r, N_xy/N_r, (N_xy + 2 * N_t * dx)/N_r], order=1)\n", | ||
" for N_r in N_rs], axis=0,\n", | ||
")\n", | ||
"\n", | ||
"# convert into uint16\n", | ||
"img = (img * 10000).astype(np.uint16)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# save it as a timelapse\n", | ||
"\n", | ||
"file_pattern = 'data/large_3d_dataset_tp%03d.tif'\n", | ||
"\n", | ||
"os.makedirs('data', exist_ok=True)\n", | ||
"\n", | ||
"N_t = 20\n", | ||
"for t in tqdm(range(N_t)):\n", | ||
" curr_tp = img[:, :, dx * t: dx * t + N_xy]\n", | ||
" tifffile.imwrite(file_pattern %t, curr_tp)\n", | ||
"\n", | ||
"print('Total dataset size: %.2f GB' %(N_t * N_xy ** 2 * N_z * 2 / 10**9))\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Loading the dataset into a dask array\n", | ||
"\n", | ||
"In order to lazily access images, we construct a dask array containing chunks that are computed by reading the corresponding image data from disk." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"scrolled": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from glob import glob\n", | ||
"import zarr\n", | ||
"\n", | ||
"file_pattern = 'data/large_3d_dataset_tp*.tif'\n", | ||
"files = sorted(glob(file_pattern))\n", | ||
"\n", | ||
"# determine the shape and dtype of the data\n", | ||
"zarr_arr = zarr.open(tifffile.imread(file_pattern, aszarr=True))\n", | ||
"\n", | ||
"N_t, N_z, N_x, N_y = zarr_arr.shape\n", | ||
"dtype = zarr_arr.dtype\n", | ||
"\n", | ||
"print('Total dataset size is %s GB'\n", | ||
" %(np.product([N_t, N_z, N_x, N_y, dtype.itemsize]) / 1e9))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# define a custom reader function\n", | ||
"# which loads a single 2D frame from a 3d tif file\n", | ||
"\n", | ||
"def load_2d(t, z):\n", | ||
" return tifffile.TiffFile(files[t]).pages[z].asarray()\n", | ||
"\n", | ||
"# loading should be lazy\n", | ||
"load_2d = delayed(load_2d)\n", | ||
"\n", | ||
"# manually compose a dask array from the individual lazily loaded frames\n", | ||
"# `da.from_delayed` converts a delayed object into a dask array, given\n", | ||
"# information about the shape and dtype of the delayed result\n", | ||
"ims = da.stack([\n", | ||
" da.stack([\n", | ||
" da.from_delayed(load_2d(t, z),\n", | ||
" shape=(N_x, N_y),\n", | ||
" dtype=dtype)\n", | ||
" for z in range(N_z)])\n", | ||
" for t in range(N_t)])\n", | ||
"\n", | ||
"ims" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Visualize the dataset\n", | ||
"\n", | ||
"Since dask arrays essentially behave like numpy arrays, many viewers support the visualization of the previously constructed dask array." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Using tifffile" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# tifffile contains a multidimensional image viewer based on matplotlib\n", | ||
"\n", | ||
"tifffile.imshow(ims)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Using napari\n", | ||
"Napari supports dask arrays." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import napari\n", | ||
"\n", | ||
"viewer = napari.Viewer()\n", | ||
"\n", | ||
"viewer.add_image(ims)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Using ipywidgets to interact with matplotlib plots" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%matplotlib notebook\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"from ipywidgets import interact\n", | ||
"\n", | ||
"# a simple multi-dimensional image viewer\n", | ||
"def browse_images(ims, show_colorbar=False):\n", | ||
"\n", | ||
" plt.figure()\n", | ||
" \n", | ||
" # determine the shape of the non spatial dimensions\n", | ||
" scroll_shape = ims.shape[:-2]\n", | ||
"\n", | ||
" def view_image(**kwargs):\n", | ||
" pos = tuple(kwargs[dim] for dim in sorted(kwargs))\n", | ||
" plt.imshow(ims[tuple(pos)].T, cmap=plt.cm.gray_r, interpolation='nearest')\n", | ||
" if show_colorbar:\n", | ||
" plt.colorbar()\n", | ||
"\n", | ||
" # interact with the viewer using the non spatial dimensions\n", | ||
" interact(view_image,\n", | ||
" **{'dim %s' %dim: (0, s-1) for dim, s in enumerate(scroll_shape)})\n", | ||
"\n", | ||
" plt.show()\n", | ||
" \n", | ||
"browse_images(ims)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Process the image on the fly\n", | ||
"\n", | ||
"In addition to viewing the raw images, we can perform operations on the array before viewing it." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# max projection as a simple example\n", | ||
"\n", | ||
"browse_images(ims.max(-3))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# another example: local background subtraction\n", | ||
"\n", | ||
"ims_mod = ims.astype(np.int32) - ndfilters.minimum_filter(ims, size=(1, 1, 30, 30))\n", | ||
"ims_mod = np.clip(ims_mod, 0, 2**16 - 1)\n", | ||
"\n", | ||
"browse_images(ims_mod, show_colorbar=False)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# another workflow\n", | ||
"\n", | ||
"ims_max = ims.max(1)\n", | ||
"ims_max = ims_max.rechunk((1, 600, 600))\n", | ||
"ims_proc = ndfilters.gaussian_filter(ims_max, (0, 2, 2))\n", | ||
"ims_proc = ims_proc.astype(float) - ndfilters.minimum_filter(ims_proc, (1, 50, 50))\n", | ||
"ims_proc = ims_proc / ndfilters.maximum_filter(ims_proc.rechunk((1, 600, 600)), (1, 50, 50))\n", | ||
"ims_proc = ims_proc > 0.5\n", | ||
"\n", | ||
"browse_images(ims_proc)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# once we're happy with the workflow, we can compute the result\n", | ||
"# and stream it into a file\n", | ||
"\n", | ||
"from dask import diagnostics\n", | ||
"\n", | ||
"with diagnostics.ProgressBar():\n", | ||
" da.to_zarr(ims_proc, 'data/processed.zarr', overwrite=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Obtain properties from objects" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from scipy import ndimage\n", | ||
"from skimage import measure\n", | ||
"import pandas as pd\n", | ||
"\n", | ||
"def get_object_properties(im_binary, im_intensities, t):\n", | ||
" labels, _ = ndimage.label(im_binary)\n", | ||
" props = measure.regionprops_table(\n", | ||
" labels,\n", | ||
" intensity_image=im_intensities,\n", | ||
" properties=['label', 'centroid', 'area', 'mean_intensity'])\n", | ||
" props = pd.DataFrame(props)\n", | ||
" props['t'] = t\n", | ||
" return props\n", | ||
"\n", | ||
"dfs = []\n", | ||
"for t, im in enumerate(ims_proc[:3]):\n", | ||
" df = delayed(get_object_properties)(im, ims_max[t], t)\n", | ||
" dfs.append(df)\n", | ||
"\n", | ||
"with dask.diagnostics.ProgressBar():\n", | ||
" df = pd.concat(dask.compute(dfs)[0], ignore_index=True)\n", | ||
"\n", | ||
"df" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.18" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Lazy and parallel bio-image processing using DASK | ||
|
||
When images take too long to load or process, or don’t fit into memory, they can be split up and managed in small parts. This course is about easily gaining control over which parts of an image are processed when and where in the context of interactive python workflows, napari and cluster computing. | ||
|
||
To get started, we need to install/activate a suitable conda environment: | ||
|
||
Option 1: | ||
|
||
Use the environment created in the [course preparation]( | ||
https://biapol.github.io/PoL-BioImage-Analysis-TS-GPU-Accelerated-Image-Analysis/00_course_preparation/Readme.html) and install some further packages to it: | ||
|
||
``` | ||
mamba activate devbio-napari-env | ||
mamba install -c conda-forge dask-image ipycytoscape | ||
``` | ||
|
||
Option 2: | ||
|
||
Create a new environment from scratch: | ||
|
||
``` | ||
mamba create --name <dask_course> python=3.9 devbio-napari pyqt dask-image ipycytoscape -c conda-forge | ||
mamba activate dask_course | ||
``` | ||
|
||
## Lecture materials | ||
|
||
Will follow. |
Oops, something went wrong.