Author: Elisa Warner
Last Updated: April 30, 2020
This code is designed to help create an easy-to-use method for image preprocessing. It is for use with Python 3.6. It was written specifically to work with Aperio .SVS files, but if your file type works in a similar way this should work too.
- 04/30/2020 v2: Code was updated to support folders with large volumes of histopathology images. This fixes a bug error which would cause the kernel RAM to throw an out-of-memory error.
- 04/30/2020 v2: Enabled the use of error catching for files which cannot be processed. Users can fiew the failed files in a separate cell.
- 04/30/2020 v2: Imaging of the patches at the end prevents an error which occurs if the number of extracted patches < 100.
patch_extractor.ipynb
: Extracts patches from histology slidespatch_functions.ipynb
: A notebook of functions forpatch_extractor.ipynb
An easy method for patch sampling, which requires only the changing of 5 hyperparameters to run the code.
- Input: a folder of histology slide images
- Output: a dictionary of patches by image and an output folder of subfolders, where each subfolder represents an image and every file within is a patch.
openslide
: to manipulate histology imagesimport_ipynb
: to accesspatch_functions.ipynb
matplotlib.pyplot
: to plot the resultstqdm
: to view progress bar
This code has been tested on a Mac with Python 3.6 with Anaconda. To install the required packages, simply type in the command line: pip install -r requirements.txt
More info on Anaconda: https://www.anaconda.com/distribution/
More info on pip: https://pip.pypa.io/en/stable/reference/pip_download/
Patch Extractor assumes you are extracting NxN tiles from a histology image that can be opened with openslide. There are five hyperparameters for this code:
FILETYPE
: [str] the file extension of your image (the code will only look for files with this file extension)FILE_DIR
: [str] the directory of images. It is assumed that there exists a folder of images and that the images are not in subfolders.OUT_DIR
: [str] the name of the directory for the output. It does not have to exist yet (the program will create the directory). The structure of the output directory will be a folder of subfolders, where each subfolder represents an original histology image, and within that subfolder will contain all the patches for the specified image.TILE_SIZE
: [int] This is the length N of one side of the tile. The program assumes each tile is of size NxN.WHITESPACE_CUTOFF
: [0,1] This is a percentage of whitespace that you will allow for any given tile. If you want to accept all tiles, input0
. Otherwise input a value between0
and1
(default:0.35
). Note that there also can exist whitespace within your sample.