Cellpose 2.0 extension for training with sparse annotations.\n",
+ "\n",
+ "This notebook shows how to train cellpose models with sparse annotations."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IvyuR08OZfw4"
+ },
+ "source": [
+ "# Setup\n",
+ "\n",
+ "We expect that you already set up a conda environment.\\\n",
+ "The following command will install the extra modules required in this notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: stardist==0.8.3 in /mnt/md0/applications/miniconda3/envs/cellpose/lib/python3.8/site-packages (0.8.3)\r\n",
+ "Requirement already satisfied: csbdeep==0.7.2 in /mnt/md0/applications/miniconda3/envs/cellpose/lib/python3.8/site-packages (0.7.2)\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install --no-deps stardist==0.8.3 csbdeep==0.7.2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e2cBEO1PLuO7"
+ },
+ "source": [
+ "Check CUDA version and that GPU is working in cellpose and import other libraries."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Tt8hgC7rniP8",
+ "outputId": "677fa3d0-952f-4490-f5bb-4ef1ad0b0469"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/bin/bash: nvcc: command not found\n",
+ "Sun Sep 25 23:34:36 2022 \n",
+ "+-----------------------------------------------------------------------------+\n",
+ "| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |\n",
+ "|-------------------------------+----------------------+----------------------+\n",
+ "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
+ "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
+ "| | | MIG M. |\n",
+ "|===============================+======================+======================|\n",
+ "| 0 NVIDIA RTX A6000 Off | 00000000:17:00.0 Off | Off |\n",
+ "| 30% 40C P8 30W / 300W | 5MiB / 48685MiB | 0% Default |\n",
+ "| | | N/A |\n",
+ "+-------------------------------+----------------------+----------------------+\n",
+ "| 1 NVIDIA RTX A6000 Off | 00000000:73:00.0 Off | Off |\n",
+ "| 30% 41C P8 26W / 300W | 2483MiB / 48677MiB | 0% Default |\n",
+ "| | | N/A |\n",
+ "+-------------------------------+----------------------+----------------------+\n",
+ " \n",
+ "+-----------------------------------------------------------------------------+\n",
+ "| Processes: |\n",
+ "| GPU GI CI PID Type Process name GPU Memory |\n",
+ "| ID ID Usage |\n",
+ "|=============================================================================|\n",
+ "| 0 N/A N/A 2430 G /usr/lib/xorg/Xorg 4MiB |\n",
+ "| 1 N/A N/A 2430 G /usr/lib/xorg/Xorg 38MiB |\n",
+ "| 1 N/A N/A 2700 G /usr/bin/gnome-shell 7MiB |\n",
+ "| 1 N/A N/A 3829328 C /opt/conda/bin/python 2433MiB |\n",
+ "+-----------------------------------------------------------------------------+\n",
+ ">>> GPU activated? YES\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os, shutil\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "from cellpose import core, utils, io, models, metrics, transforms\n",
+ "from csbdeep.utils import Path, download_and_extract_zip_file, normalize\n",
+ "from glob import glob\n",
+ "from natsort import natsorted\n",
+ "from stardist import fill_label_holes, random_label_cmap\n",
+ "from stardist.matching import matching, matching_dataset\n",
+ "import skimage.measure\n",
+ "from tifffile import imread\n",
+ "from tqdm import tqdm\n",
+ "\n",
+ "\n",
+ "from utils import plot_img_label, to_sparse, get_data, plot_stats, run_analysis, remove_small_labels\n",
+ "\n",
+ "use_GPU = core.use_gpu()\n",
+ "yn = ['NO', 'YES']\n",
+ "print(f'>>> GPU activated? {yn[use_GPU]}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Data\n",
+ "\n",
+ "\n",
+ "Training data (for input `X` with associated label masks `Y`) can be provided via lists of numpy arrays, where each image can have a different size. Alternatively, a single numpy array can also be used if all images have the same size. \n",
+ "Input images can either be two-dimensional (single-channel) or three-dimensional (multi-channel) arrays, where the channel axis comes last. Label images need to be integer-valued.\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Files found, nothing to download.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [00:01<00:00, 302.05it/s]\n",
+ "100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [00:02<00:00, 187.74it/s]\n",
+ "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:00<00:00, 249.61it/s]\n",
+ "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:00<00:00, 123.91it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "number of images for training: 416\n",
+ "number of images for validation: 48\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "(X_trn, Y_trn), (X_val, Y_val) = get_data()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VfE75htF0l84"
+ },
+ "source": [
+ "# Train model on manual annotations\n",
+ "\n",
+ "Skip this step if you already have a pretrained model.\n",
+ "\n",
+ "Fill out the form below with the paths to your data and the parameters to start training."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "lLdKNWQ4jxy5"
+ },
+ "source": [
+ "## Training parameters\n",
+ "\n",
+ " **Paths for training, predictions and results**\n",
+ "\n",
+ "\n",
+ "**`train_dir:`, `test_dir`:** These are the paths to your folders train_dir (with images and masks of training images) and test_dir (with images and masks of test images). You can leave the test_dir blank, but it's recommended to have some test images to check the model's performance. To find the paths of the folders containing the respective datasets, go to your Files on the left of the notebook, navigate to the folder containing your files and copy the path by right-clicking on the folder, **Copy path** and pasting it into the right box below.\n",
+ "\n",
+ "**`initial_model`:** Choose a model from the cellpose [model zoo](https://cellpose.readthedocs.io/en/latest/models.html#model-zoo) to start from.\n",
+ "\n",
+ "**`model_name`**: Enter the path where your model will be saved once trained (for instance your result folder).\n",
+ "\n",
+ "**Training parameters**\n",
+ "\n",
+ "**`number_of_epochs`:** Input how many epochs (rounds) the network will be trained. At least 100 epochs are recommended, but sometimes 250 epochs are necessary, particularly from scratch. **Default value: 100**\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "cellView": "form",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "XQI4aUxCjz3n",
+ "outputId": "804d0459-b120-4298-9b4c-87a9ca26401c"
+ },
+ "outputs": [],
+ "source": [
+ "#@markdown ###Path to saved models:\n",
+ "\n",
+ "save_path = \".\" #@param {type:\"string\"}\n",
+ "\n",
+ "# model name and path\n",
+ "#@markdown ###Name of the pretrained model to start from and new model name:\n",
+ "initial_model = \"scratch\" #@param ['cyto','nuclei','tissuenet','livecell','cyto2','CP','CPx','TN1','TN2','TN3','LC1','LC2','LC3','LC4','scratch']\n",
+ "\n",
+ "# other parameters for training.\n",
+ "#@markdown ###Training Parameters:\n",
+ "#@markdown Number of epochs:\n",
+ "n_epochs = 100 #@param {type:\"number\"}\n",
+ "\n",
+ "Channel_to_use_for_training = \"Grayscale\" #@param [\"Grayscale\", \"Blue\", \"Green\", \"Red\"]\n",
+ "\n",
+ "# @markdown ###If you have a secondary channel that can be used for training, for instance nuclei, choose it here:\n",
+ "\n",
+ "Second_training_channel= \"None\" #@param [\"None\", \"Blue\", \"Green\", \"Red\"]\n",
+ "\n",
+ "\n",
+ "#@markdown ###Advanced Parameters\n",
+ "\n",
+ "Use_Default_Advanced_Parameters = False #@param {type:\"boolean\"}\n",
+ "#@markdown ###If not, please input:\n",
+ "learning_rate = 0.001 #@param {type:\"number\"}\n",
+ "weight_decay = 0.0001 #@param {type:\"number\"}\n",
+ "\n",
+ "# Here we match the channel to number\n",
+ "chan = [\"Grayscale\", \"Red\", \"Green\", \"Blue\"].index(Channel_to_use_for_training)\n",
+ "chan2 = [\"None\", \"Red\", \"Green\", \"Blue\"].index(Second_training_channel)\n",
+ "\n",
+ "if initial_model=='scratch':\n",
+ " initial_model = None"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3JRxBPmatrK7"
+ },
+ "source": [
+ "## Train new model\n",
+ "\n",
+ "Using settings from form above, train model in notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def train_model(X_trn, Y_trn, X_val, Y_val, description, **kwargs):\n",
+ " use_GPU = kwargs.get('use_GPU', core.use_gpu())\n",
+ " initial_model = kwargs.get('initial_model', None)\n",
+ " channels = kwargs.get('channels', [0, 0])\n",
+ " save_path = kwargs.get('save_path', '.')\n",
+ " n_epochs = kwargs.get('n_epochs', 100)\n",
+ " learning_rate = kwargs.get('learning_rate', 0.001)\n",
+ " weight_decay = kwargs.get('weight_decay', 0.0001)\n",
+ " nimg_per_epoch = kwargs.get('nimg_per_epoch', 8)\n",
+ " \n",
+ " model = models.CellposeModel(gpu=use_GPU, model_type=initial_model)\n",
+ " model.train(X_trn.copy(),\n",
+ " Y_trn,\n",
+ " test_data=X_val.copy(),\n",
+ " test_labels=Y_val.copy(),\n",
+ " channels=channels,\n",
+ " normalize=False, # already normalized\n",
+ " save_path=save_path,\n",
+ " n_epochs=n_epochs,\n",
+ " learning_rate=learning_rate,\n",
+ " weight_decay=weight_decay,\n",
+ " nimg_per_epoch=nimg_per_epoch,\n",
+ " model_name=description,\n",
+ " min_train_masks=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def eval_model(X_val, description, **kwargs):\n",
+ " use_GPU = kwargs.get('use_GPU', core.use_gpu())\n",
+ " save_path = kwargs.get('save_path', '.')\n",
+ " model_path = os.path.join(save_path, 'models', description)\n",
+ " model = models.CellposeModel(gpu=use_GPU, pretrained_model=model_path)\n",
+ " channels = kwargs.get('channels', [0, 0]) \n",
+ " diam_labels = model.diam_labels.copy()\n",
+ " return [model.eval(x,\n",
+ " channels=channels,\n",
+ " normalize=False, # already normalized\n",
+ " diameter=diam_labels)[0]\n",
+ " for x in tqdm(X_val)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2022-09-25 23:43:10,207 [INFO] WRITING LOG OUTPUT TO /home/useradmin/.cellpose/run.log\n"
+ ]
+ }
+ ],
+ "source": [
+ "# start logger (to see training across epochs)\n",
+ "logger = io.logger_setup()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Train models? (existing models can be overwritten) (y/n): y\n"
+ ]
+ }
+ ],
+ "source": [
+ "is_train = input('Train models? (existing models can be overwritten) (y/n): ').lower().strip() == 'y'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "416it [00:05, 81.18it/s] \n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2022-09-29 12:06:49,125 [INFO] ** TORCH CUDA version installed and working. **\n",
+ "2022-09-29 12:06:49,127 [INFO] ** TORCH CUDA version installed and working. **\n",
+ "2022-09-29 12:06:49,128 [INFO] >>>> using GPU\n",
+ "2022-09-29 12:06:49,373 [INFO] computing flows for labels\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [00:04<00:00, 93.52it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2022-09-29 12:06:54,898 [INFO] computing flows for labels\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:01<00:00, 34.96it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2022-09-29 12:06:57,126 [INFO] >>>> median diameter set to = 30\n",
+ "2022-09-29 12:06:57,128 [INFO] >>>> mean of training label mask diameters (saved to model) 14.897\n",
+ "2022-09-29 12:06:57,129 [INFO] >>>> training network with 2 channel input <<<<\n",
+ "2022-09-29 12:06:57,130 [INFO] >>>> LR: 0.00100, batch_size: 8, weight_decay: 0.00010\n",
+ "2022-09-29 12:06:57,131 [INFO] >>>> ntrain = 416, ntest = 48\n",
+ "2022-09-29 12:06:57,133 [INFO] >>>> nimg_per_epoch = 416\n",
+ "2022-09-29 12:07:05,237 [INFO] Epoch 0, Time 8.1s, Loss 2.1501, Loss Test 1.5788, LR 0.0000\n",
+ "2022-09-29 12:07:13,027 [INFO] saving network parameters to paper03/models/min_001\n",
+ "2022-09-29 12:07:46,773 [INFO] Epoch 5, Time 49.6s, Loss 1.1161, Loss Test 1.1538, LR 0.0006\n",
+ "2022-09-29 12:08:28,938 [INFO] Epoch 10, Time 91.8s, Loss 0.7251, Loss Test 0.9274, LR 0.0010\n",
+ "2022-09-29 12:09:54,560 [INFO] Epoch 20, Time 177.4s, Loss 0.5123, Loss Test 1.2449, LR 0.0010\n",
+ "2022-09-29 12:11:19,926 [INFO] Epoch 30, Time 262.8s, Loss 0.3384, Loss Test 1.4229, LR 0.0010\n",
+ "2022-09-29 12:12:44,859 [INFO] Epoch 40, Time 347.7s, Loss 0.3928, Loss Test 1.4110, LR 0.0010\n",
+ "2022-09-29 12:14:08,739 [INFO] Epoch 50, Time 431.6s, Loss 0.3070, Loss Test 1.4498, LR 0.0010\n",
+ "2022-09-29 12:15:32,386 [INFO] Epoch 60, Time 515.3s, Loss 0.3390, Loss Test 1.6376, LR 0.0010\n",
+ "2022-09-29 12:16:57,084 [INFO] Epoch 70, Time 600.0s, Loss 0.3613, Loss Test 1.6754, LR 0.0010\n",
+ "2022-09-29 12:18:21,669 [INFO] Epoch 80, Time 684.5s, Loss 0.3684, Loss Test 1.7127, LR 0.0010\n",
+ "2022-09-29 12:19:46,101 [INFO] Epoch 90, Time 769.0s, Loss 0.3224, Loss Test 1.4578, LR 0.0010\n",
+ "2022-09-29 12:21:01,553 [INFO] saving network parameters to paper03/models/min_001\n",
+ "2022-09-29 12:21:01,822 [INFO] ** TORCH CUDA version installed and working. **\n",
+ "2022-09-29 12:21:01,825 [INFO] >>>> loading model paper03/models/min_001\n",
+ "2022-09-29 12:21:01,827 [INFO] ** TORCH CUDA version installed and working. **\n",
+ "2022-09-29 12:21:01,828 [INFO] >>>> using GPU\n",
+ "2022-09-29 12:21:02,053 [INFO] >>>> model diam_mean = 30.000 (ROIs rescaled to this size during training)\n",
+ "2022-09-29 12:21:02,054 [INFO] >>>> model diam_labels = 14.897 (mean diameter of training ROIs)\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:17<00:00, 2.82it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ "