diff --git a/.github/workflows/package-test.yml b/.github/workflows/package-test.yml index 15b1404..bf02013 100644 --- a/.github/workflows/package-test.yml +++ b/.github/workflows/package-test.yml @@ -13,7 +13,6 @@ jobs: fail-fast: false matrix: include: - - python-version: "3.9" - python-version: "3.10" - python-version: "3.11" diff --git a/.gitignore b/.gitignore index 785f746..2b13975 100644 --- a/.gitignore +++ b/.gitignore @@ -163,3 +163,4 @@ cython_debug/ sandbox/ .azure +data/ \ No newline at end of file diff --git a/README.md b/README.md index b546bb5..39da8bc 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,12 @@ These examples take a closer look at certain solutions and patterns of usage for * **[Image Search Series Pt 1: Searching for similar XRay images](./azureml/advanced_demos/image_search/2d_image_search.ipynb)** [MI2] - an opener in the series on image-based search. How do you use foundation models to build an efficient system to look up similar Xrays? Read [our blog](https://techcommunity.microsoft.com/blog/healthcareandlifesciencesblog/image-search-series-part-1-chest-x-ray-lookup-with-medimageinsight/4372736) for more details. * **[Image Search Series Pt 2: 3D Image Search with MedImageInsight](./azureml/advanced_demos/image_search/3d_image_search.ipynb)** [MI2] - expanding on the image-based search topics we look at 3D images. How do you use foundation models to build a system to search the archive of CT scans for those with similar lesions in the pancreas? Read [our blog](https://aka.ms/3DImageSearch) for more details. +### 🤖 Agentic AI Examples + +These examples demonstrate how to build intelligent conversational agents that integrate healthcare AI models with natural language understanding: + +* **[Medical Image Classification Agent](./azureml/medimageinsight/agent-classification-example.ipynb)** [MI2, GPT] - build a conversational AI agent that classifies medical images through natural language interactions. Learn practical patterns for coordinating image data with LLM function calls, managing conversation state, and routing image analysis tasks to MedImageInsight embeddings. + ## Getting Started To get started with using our healthcare AI models and examples, follow the instructions below to set up your environment and run the sample applications. @@ -67,8 +73,8 @@ To get started with using our healthcare AI models and examples, follow the inst - **Optional**: Azure OpenAI access for GPT models (limited use in examples). - **Tools**: - **For running examples**: - - [AzCopy](https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy) for downloading sample data - - Python `>=3.9.0,<3.12` and pip `>=21.3` (for running locally) + - Python `>=3.10.0,<3.12` and pip `>=21.3` (for running locally) + - [Git LFS](https://git-lfs.github.com/) for cloning the data repository - **For deploying models**: - [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) - [Azure Developer CLI](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd?tabs=winget-windows%2Cbrew-mac%2Cscript-linux&pivots=os-linux) @@ -221,33 +227,38 @@ Now that you have deployed the models, you need to configure your local environm After deployment, verify that your root level `.env` file contains the necessary environment variables for connecting to your deployed models. Each automatic deployment method will configure this file with the appropriate settings for your chosen approach. > [!IMPORTANT] -> Check the value of `DATA_ROOT` in your `.env` file to ensure it's appropriate for your setup. The default value is `/home/azureuser/data/`, but you may need to modify it based on your environment. If you change the `DATA_ROOT` value, you'll also need to update the destination path in the azcopy command in the following step. +> Check the value of `DATA_ROOT` in your `.env` file to ensure it's appropriate for your setup. The default value is `/home/azureuser/data/healthcare-ai/`, but you may need to modify it based on your environment. **Use an absolute path** (not a relative path like `./data/`) to ensure consistent access across different working directories. If you change the `DATA_ROOT` value, you'll also need to update the destination path in the git clone command in the following step. +> +> **Azure OpenAI Configuration**: If you deployed GPT models, your `.env` file will contain `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_KEY`. The endpoint supports two formats: +> 1. **Full inference URI** (deployed automatically): `https://{your-service}.cognitiveservices.azure.com/openai/deployments/{deployment}/chat/completions?api-version={version}`. +> 2. **Base endpoint** (for manual configuration): `https://{your-service}.cognitiveservices.azure.com/` with separate `AZURE_OPENAI_DEPLOYMENT_NAME` variable. +> +> See `env.example`. > [!NOTE] > If you used a manual deployment method you will have to configure this file yourself, see [Manual Deployment](docs/manual-deployment.md) for more information. #### Download Sample Data -The sample data used by the examples is located in our Blob Storage account. Use [azcopy tool](https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy) to download: +The sample data used by the examples is available in the [healthcareai-examples-data](https://github.com/microsoft/healthcareai-examples-data) GitHub repository. + +> [!IMPORTANT] +> The data repository uses Git LFS (Large File Storage) for medical image files. Make sure you have [Git LFS](https://git-lfs.github.com/) installed before cloning. Without it, you'll only download placeholder files instead of the actual data. + +Clone the repository to download the data: ```sh -azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/ +git clone https://github.com/microsoft/healthcareai-examples-data.git /home/azureuser/data/healthcare-ai ``` > [!TIP] -> This downloads the entire dataset. For specific examples, you can download subsets by appending the subfolder name to the source URL. +> This downloads the entire dataset. If you prefer a different location, adjust the target path and update the `DATA_ROOT` value in your `.env` file accordingly. For more information about the data, see the [data repository README](https://github.com/microsoft/healthcareai-examples-data/blob/main/README.md). #### Install Healthcare AI Toolkit Install the helper toolkit that facilitates working with endpoints, DICOM files, and medical imaging: ```sh -# Standard installation -pip install ./package/ -``` -_or_ -```sh -# Editable installation for development pip install -e ./package/ ``` @@ -271,6 +282,8 @@ Now you're ready to explore the notebooks! Start with one of these paths: **📋 Report Generation**: See example usage in **[CXRReportGen deployment](./azureml/cxrreportgen/cxr-deploy.ipynb)**. +**🤖 Agentic AI**: Learn how to use models within an agentic framework with the **[medical image classification agent](./azureml/medimageinsight/agent-classification-example.ipynb)**. + **🚀 Advanced**: Explore **[image search](./azureml/advanced_demos/image_search/2d_image_search.ipynb)**, **[outlier detection](./azureml/medimageinsight/outlier-detection-demo.ipynb)**, or **[multimodal analysis](./azureml/advanced_demos/radpath/rad_path_survival_demo.ipynb)**. ## Project Structure diff --git a/azureml/advanced_demos/image_search/2d_image_search.ipynb b/azureml/advanced_demos/image_search/2d_image_search.ipynb index fd8d4b8..7b9d022 100644 --- a/azureml/advanced_demos/image_search/2d_image_search.ipynb +++ b/azureml/advanced_demos/image_search/2d_image_search.ipynb @@ -12,36 +12,27 @@ "## Image Search Series Part 1: Chest X-ray Search with MedImageInsight (MI2)\n", "In this tutorial, we show you how to build and optimize a 2D image search system for chest X-rays using **MedImageInsight embeddings**.\n", "\n", - "### Dataset \n", - "We provide a sample dataset of 100 2D chest X-ray DICOM images, categorized into the following pathology classes: No Findings, Support Devices, Pleural Effusion, Cardiomegaly, and Atelectasis. Each image contains a single pathology class, but the methods demonstrated can be adapted for multi-label scenarios as well.\n", + "## Prerequisites\n", "\n", - "Please download the data using the following command:\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", - "```sh\n", - "azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ \n", - "/home/azureuser/data/\n", - "```\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", "\n", - "### Online Endpoint Deployment \n", - "The **MedImageInsight (MI2) Model** can be accessed and deployed via the [Azure AI Model Catalog](https://azure.microsoft.com/en-us/products/ai-model-catalog). Alternatively, you can deploy the model programmatically, as detailed in the [deployment notebook](https://aka.ms/healthcare-ai-examples-mi2-deploy).\n", + "### Required for This Notebook\n", "\n", - "### Environment \n", - "1. Install the **healthcareai_toolkit** package from the root of the repository: \n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT` \n", + "- **Additional Dependencies**: \n", + " ```bash\n", + " conda install -c pytorch faiss-cpu\n", + " ```\n", "\n", - " ```sh\n", - " pip install ./package\n", - " ```\n", - "2. Set up your `.env` file with the `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n", - "3. Install the **FAISS** library using: \n", - " ```sh\n", - " conda install -c pytorch faiss-cpu\n", - " ```\n", - "\n", - "### FAISS (Facebook AI Similarity Search) \n", - "[FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, even those too large to fit in memory. It supports adding or removing individual vectors and computes exact distances between them. FAISS is perfect for building scalable image search systems like the one in this tutorial.\n", + "> **Note**: [FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, making it perfect for building scalable image search systems like the one in this tutorial.\n", "\n", "### 2D Image Search \n", - "This tutorial walks you through the use of an embedding model to create a vector index and then build a system that woulod look up similar images based on image provided. We will first use out-of-the-box capabilities of MedImageInsight model to build a basic system, and then will enhance performance by applying some of the concepts introduced in other notebooks from this repository. In a prior [adapter training notebook](https://aka.ms/healthcare-ai-examples-mi2-adapter), we demonstrated how to train an adapter for classification. Here, we will also train a simple adapter to refine the MI2 model’s embeddings to improve representation and then see how it improves performance. \n", + "This tutorial walks you through the use of an embedding model to create a vector index and then build a system that woulod look up similar images based on image provided. We will first use out-of-the-box capabilities of MedImageInsight model to build a basic system, and then will enhance performance by applying some of the concepts introduced in other notebooks from this repository. In a prior [adapter training notebook](https://aka.ms/healthcare-ai-examples-mi2-adapter), we demonstrated how to train an adapter for classification. Here, we will also train a simple adapter to refine the MI2 model's embeddings to improve representation and then see how it improves performance. \n", "\n", "In either approach we will be building an index using FAISS library. Note that the index will need to be rebuilt if we are using different representations (like with the adapter approach). Once the FAISS index is built, we query it with a new embedding (query vector) to retrieve the most similar images. FAISS supports both exact and approximate nearest neighbor searches, allowing for a balance between speed and precision. In this tutorial, we use nearest neighbor search to find the most relevant images based on the query.\n", "\n", @@ -80,7 +71,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -106,7 +97,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -135,7 +126,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -164,7 +155,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -208,7 +199,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -236,7 +227,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -265,7 +256,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -287,7 +278,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -466,7 +457,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -504,7 +495,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -528,7 +519,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -588,7 +579,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -620,7 +611,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -674,7 +665,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -847,7 +838,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [ { diff --git a/azureml/advanced_demos/image_search/3d_image_search.ipynb b/azureml/advanced_demos/image_search/3d_image_search.ipynb index 7b8c3e2..c34c7d3 100644 --- a/azureml/advanced_demos/image_search/3d_image_search.ipynb +++ b/azureml/advanced_demos/image_search/3d_image_search.ipynb @@ -19,6 +19,25 @@ "\n", "This tutorial demonstrates how to build a **Zero-Shot 3D Image Search System** using **MedImageInsight (MI2) embeddings**.\n", "\n", + "## Prerequisites\n", + "\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", + "\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", + "\n", + "### Required for This Notebook\n", + "\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT` \n", + "- **Additional Dependencies**: \n", + " ```bash\n", + " conda install -c pytorch faiss-cpu\n", + " ```\n", + "\n", + "> **Note**: [FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, making it perfect for building scalable image search systems like the one in this tutorial.\n", + "\n", "### Data\n", "This tutorial uses the [**3D-MIR dataset**](https://github.com/abachaa/3D-MIR), designed for **organ lesion detection** and **severity assessment**, with a particular focus on evaluating **pancreatic lesions**. The dataset is organized into distinct **training** and **testing** splits to facilitate the development and evaluation of the 3D image search system.\n", "\n", @@ -47,28 +66,8 @@ "\n", "#### 3. Pancreas Training and Testing Splits\n", "The dataset is divided into two files for training and testing purposes:\n", - "- **Training Split** — Located at `/home/azureuser/data/healthcare-ai/advanced-3D-image-search/pancreas_full_train_split.csv` (used as the retrieval database).\n", - "- **Testing Split** — Located at `/home/azureuser/data/healthcare-ai/advanced-3D-image-search/pancreas_full_test_split.csv` (used for image queries).\n", - "\n", - "### Online Endpoint Deployment\n", - "The **MedImageInsight (MI2) Model** is available for deployment through the **[Azure AI Model Catalog](https://azure.microsoft.com/en-us/products/ai-model-catalog)**, providing a straightforward way to access and use the model.\n", - "\n", - "For those preferring a programmatic approach, detailed instructions can be found in the dedicated **[deployment notebook](https://aka.ms/healthcare-ai-examples-mi2-deploy)**. This guide outlines the steps required to deploy the MI2 model efficiently via code.\n", - "\n", - "\n", - "### Environment\n", - "1. Install the **healthcareai_toolkit** package from the root of the repository:\n", - " ```sh\n", - " pip install ./package\n", - " ```\n", - "2. Set up your `.env` file with the `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n", - "3. Install the **FAISS** library using:\n", - " ```sh\n", - " conda install -c pytorch faiss-cpu\n", - " ```\n", - "\n", - "### FAISS (Facebook AI Similarity Search)\n", - "[FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, even those too large to fit in memory. It supports adding or removing individual vectors and computes exact distances between them. FAISS is perfect for building scalable image search systems like the one in this tutorial.\n", + "- **Training Split** — Located at `/advanced-3D-image-search/pancreas_full_train_split.csv` (used as the retrieval database).\n", + "- **Testing Split** — Located at `/advanced-3D-image-search/pancreas_full_test_split.csv` (used for image queries).\n", "\n", "### 3D Image Search\n", "\n", @@ -81,7 +80,7 @@ "- Building a FAISS index to store these embeddings.\n", "- Querying the FAISS index with a new embedding (query vector) to retrieve the most similar images.\n", "\n", - "FAISS supports both **exact** and **approximate** nearest neighbor searches, allowing you to balance between speed and precision. In this tutorial, we’ll use nearest neighbor search to efficiently identify the most relevant images.\n", + "FAISS supports both **exact** and **approximate** nearest neighbor searches, allowing you to balance between speed and precision. In this tutorial, we'll use nearest neighbor search to efficiently identify the most relevant images.\n", "\n", "![3D_search.png](attachment:3D_search.png)\n", "\n", @@ -100,9 +99,7 @@ " - Create a FAISS index using the generated MI2 embeddings.\n", " - **Evaluate and Display Search Results:**\n", " - Measure classification accuracy and calculate **precision @1**, **@3**, **@5** and **@10**.\n", - " - Visualize query images alongside their retrieved neighbors to confirm accuracy.\n", - "\n", - "\n" + " - Visualize query images alongside their retrieved neighbors to confirm accuracy." ] }, { @@ -115,7 +112,7 @@ } }, "source": [ - "## **1. Set up and data preparation**" + "## 1. Setup and Imports" ] }, { @@ -156,6 +153,7 @@ " normalize_image_to_uint8,\n", " convert_volume_to_slices,\n", ")\n", + "from healthcareai_toolkit import settings\n", "from search_utils import (\n", " check_pkl_files,\n", " create_faiss_index,\n", @@ -174,9 +172,9 @@ } }, "source": [ - "### Data Paths\n", + "## 2. Data Paths\n", "\n", - "Download the data files as described above and if you are not using the default directories update the data paths below to match your environment. " + "Download the data files as described above and if you are not using the default directories update the data paths below to match your environment." ] }, { @@ -196,7 +194,7 @@ "# Available methods: median/maxpooling/avgpooling/std\n", "##############################################################################\n", "\n", - "data_path = \"/home/azureuser/data/healthcare-ai/advanced-3D-image-search\"\n", + "data_path = os.path.join(settings.DATA_ROOT, \"advanced-3D-image-search\")\n", "\n", "## Path to MSD images if you plan to generate embeddings\n", "image_input_folder = \"\"\n", @@ -239,7 +237,7 @@ } }, "source": [ - "### Ground-Truth Labels\n", + "## 3. Ground-Truth Labels\n", "\n", "Here we read our ground truth labels into in-memory data frames." ] @@ -281,7 +279,8 @@ } }, "source": [ - "## 2. 3D Embedding Generation\n", + "## 4. 3D Embedding Generation\n", + "\n", "Embedding generation is implemented by extracting 2D slices from the 3D volume and generating embeddings for each slice. The embeddings are then aggregated to create a single embedding for the entire volume. This process is done using the MI2 model, which is a pre-trained model for generating embeddings for medical images." ] }, diff --git a/azureml/advanced_demos/image_search/rag_infection_detection.ipynb b/azureml/advanced_demos/image_search/rag_infection_detection.ipynb index 39a8f22..a389e05 100644 --- a/azureml/advanced_demos/image_search/rag_infection_detection.ipynb +++ b/azureml/advanced_demos/image_search/rag_infection_detection.ipynb @@ -19,6 +19,22 @@ "\n", "This tutorial demonstrates how to build an **Image Search System** using **MedImageInsight (MI2)** with embedding-based retrieval and **vision-language model (VLM)**–based context-aware reasoning. \n", "\n", + "## Prerequisites\n", + "\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", + "\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", + "\n", + "### Required for This Notebook\n", + "\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT`, `AZURE_OPENAI_ENDPOINT`\n", + "- **Additional Dependencies**: The toolkit depends on FAISS and will install it automatically\n", + "\n", + "> **Note**: [FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, even those too large to fit in memory. It supports adding or removing individual vectors and computes exact distances between them.\n", + "\n", "#### Data \n", "This tutorial uses the **WoundcareVQA dataset** for Wound Care Visual Question Answering. \n", "\n", @@ -31,31 +47,8 @@ "- **Annotation File** — `dataset-full-original/woundcarevqa.json` (labels). \n", "- **Embeddings** — `MedImageInsight-embeddings` (train/test embeddings and FAISS index). \n", "\n", - "\n", "We focus on the **Infection Detection** task with three labels: infected, not infected, and unclear. \n", "\n", - "#### Online Endpoint Deployment\n", - "The **MedImageInsight (MI2) Model** is available for deployment through the **[Azure AI Model Catalog](https://azure.microsoft.com/en-us/products/ai-model-catalog)**, providing a straightforward way to access and use the model.\n", - "\n", - "For those preferring a programmatic approach, detailed instructions can be found in the dedicated **[deployment notebook](https://aka.ms/healthcare-ai-examples-mi2-deploy)**. This guide outlines the steps required to deploy the MI2 model efficiently via code.\n", - "\n", - "\n", - "#### Environment\n", - "1. Install the **healthcareai_toolkit** package from the root of the repository:\n", - " ```sh\n", - " pip install ./package\n", - " ```\n", - "2. Set up your `.env` file with the `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n", - "3. The toolkit depends on **FAISS**. It will be installed automatically. \n", - " If needed, you can also install it manually with:\n", - " ```sh\n", - " pip install faiss-cpu\n", - " ```\n", - "\n", - "##### FAISS (Facebook AI Similarity Search)\n", - "[FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, even those too large to fit in memory. It supports adding or removing individual vectors and computes exact distances between them.\n", - "\n", - "\n", "#### Embedding-based Retrieval & VLM–based Context-aware Reasoning \n", "\n", "This tutorial demonstrates how to retrieve similar wound images using an embedding model and the FAISS library, and then, following a **Retrieval-Augmented Generation (RAG)** approach, leverage the retrieved examples to ground and support vision-language model (VLM) reasoning for classification. \n", @@ -68,7 +61,6 @@ "\n", "![graph1.png](attachment:graph1.png)\n", "\n", - "\n", "## 🚀 Steps in this Tutorial\n", "\n", "1. **Setup and Data Preparation**\n", @@ -93,8 +85,7 @@ " - Compare predicted infection labels with ground truth annotations to compute accuracy. \n", "\n", "8. **Organize, Save, and Visualize Model Predictions vs. Ground Truth**\n", - " - Save predictions and visualize selected examples with their predicted and true labels. \n", - " \n" + " - Save predictions and visualize selected examples with their predicted and true labels." ] }, { diff --git a/azureml/advanced_demos/radpath/rad_path_survival_demo.ipynb b/azureml/advanced_demos/radpath/rad_path_survival_demo.ipynb index 0ea3285..df0814a 100644 --- a/azureml/advanced_demos/radpath/rad_path_survival_demo.ipynb +++ b/azureml/advanced_demos/radpath/rad_path_survival_demo.ipynb @@ -15,21 +15,20 @@ "\n", "## Prerequisites\n", "\n", - "### Online Endpoint Deployment\n", - "Both the radiology (MedImageInsight) and pathology (Prov-GigaPath) foundation models are accessed and deployed through Azure AI Model Catalog. You can find the models in the catalog for MedImageInsight and Prov-GigaPath. After the deployment, put the endpoint-corresponding url into [.env file](../../../env.example).\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", - "### Environment\n", - "- Please install the healthcareai_toolkit package by using the package folder from the root of the the repository: `pip install -e package`\n", - "- Please install additional package Lifelines for survival plot: `pip install lifelines~=0.27.8`\n", - "- Setup your .env file with DATA_ROOT, MI2_MODEL_ENDPOINT and GIGAPATH_ENDPOINT parameters.\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", "\n", - "### Dataset\n", - "For this tutorial, we provide a sample set of features that computed from [TCGA-GBMLGG](https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG) dataset containing 170 subjects with 2D Brain MRI slices (i.e. T1, T1 Post Contrast, T2, T2 FLAIR) and Pathology image pairs. Please download the image features using the following command:\n", - "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n", - "\n", - "Additionally, we provide survival labels for each subject. This setup will allow us to evaluate the capabilities of performing survival prediction effectively.\n", + "### Required for This Notebook\n", "\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT`, `GIGAPATH_ENDPOINT`\n", + "- **Additional Dependencies**: \n", + " ```bash\n", + " pip install lifelines~=0.27.8\n", + " ```\n", "\n", "## Adapter Training for Survival Predictions Overview\n", "![rad-path_notebook_pipeline_latest.jpg](attachment:rad-path_notebook_pipeline_latest.jpg)\n", @@ -46,8 +45,7 @@ " - Fuse radiology and pathology embeddings with simple neural network architecture (i.e. MLP)\n", " - Output **hazard value** that refers to the instantaneous risk of the event of interest (such as death, disease recurrence, or failure) occurring at a specific time point, given that the individual has survived up to that time.\n", "4. **Perform Inference**\n", - " - Use the output hazard value from the testing dataset to generate survival curve (i.e. Kaplan-Meier Survival Curve)\n", - "\n" + " - Use the output hazard value from the testing dataset to generate survival curve (i.e. Kaplan-Meier Survival Curve)" ] }, { @@ -63,7 +61,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -124,13 +122,13 @@ " - **Survival Dates** label, indicating the recorded survival duration for each subject.\n", "\n", "3. Survival Demo Label\n", - " - Location: `/home/azureuser/data/healthcare-ai/advanced-radpath-demo/survival_demo_label.csv`\n", + " - Location: `/advanced-radpath-demo/survival_demo_label.csv`\n", " - A summary file providing all subject-wise labels, including both **tumor staging** and **survival month** information.\n", "\n", "4. Survival Data Split for Training & Testing:\n", " - Location: \n", - " - Training: `/home/azureuser/data/healthcare-ai/advanced-radpath-demo/survival_adapter_train_split.csv` \n", - " - Testing: `/home/azureuser/data/healthcare-ai/advanced-radpath-demo/survival_adapter_test_split.csv`\n", + " - Training: `/advanced-radpath-demo/survival_adapter_train_split.csv` \n", + " - Testing: `/advanced-radpath-demo/survival_adapter_test_split.csv`\n", " - Csv files containing training and testing split to train the adapter model for survival prediction\n", "\n", "### 1.2 Data Analysis\n", @@ -145,16 +143,16 @@ "- To explore the relationship between tumor grading and survival outcomes, we performed a correlation analysis, uncovering several trends:\n", " - **Survival Variability in Early-Stage Tumors**: Subjects with **Grade 0 (Early Stage)** tumors display the widest range in survival months, potentially reflecting variability in early-stage treatment responses or patient-specific biological factors that influence prognosis.\n", "\n", - " - **Narrowing Survival Range in Higher Grades**: As tumor grades increase, the range of survival months becomes progressively narrower. This trend suggests that advanced-stage tumors lead to a more predictable, albeit shorter, survival period, highlighting the severity’s impact on prognosis.\n", + " - **Narrowing Survival Range in Higher Grades**: As tumor grades increase, the range of survival months becomes progressively narrower. This trend suggests that advanced-stage tumors lead to a more predictable, albeit shorter, survival period, highlighting the severity's impact on prognosis.\n", "\n", " - **Impact of Tumor Severity on Survival Outlook**: The data reveal a negative correlation between tumor grading and survival duration, emphasizing the importance of early diagnosis. Higher grades (e.g., Grade 2) demonstrate uniformly shorter survival periods, reinforcing the clinical need for early intervention.\n", " \n", - " - **Validation through Visualization**: Visual mapping of survival years across each grade corroborates and quantifies this decreasing survival trend with advancing tumor severity. This finding further supports the value of tumor staging in clinical decision-making and patient management strategies.\n" + " - **Validation through Visualization**: Visual mapping of survival years across each grade corroborates and quantifies this decreasing survival trend with advancing tumor severity. This finding further supports the value of tumor staging in clinical decision-making and patient management strategies." ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -189,7 +187,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -267,7 +265,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -308,8 +306,8 @@ "### 2.1. Generate Image Embeddings with Foundation Models (Optional)\n", "\n", "Before we dive into the embedding generation process, we first split the dataset into training and testing sets. Both training and testing splits are specified in the corresponding CSV files:\n", - "- Training set: `/home/azureuser/data/healthcare-ai/advanced-radpath-demo/survival_adapter_train_split.csv`\n", - "- Testing set: `/home/azureuser/data/healthcare-ai/advanced-radpath-demo/survival_adapter_test_split.csv`\n", + "- Training set: `/advanced-radpath-demo/survival_adapter_train_split.csv`\n", + "- Testing set: `/advanced-radpath-demo/survival_adapter_test_split.csv`\n", "\n", "After performing the data split manually, we generate image embeddings with the foundation models in [Azure AI model catalog](https://ai.azure.com/explore/models?tid=72f988bf-86f1-41af-91ab-2d7cd011db47) with respect to imaging modality. Here is the summary of the foundation models that we leverage in this tutorial:\n", "\n", @@ -347,9 +345,7 @@ " - **Intuition behind the loss function:**\n", " - **Ranking Patients:** The loss function focuses on the relative ordering of predicted risk scores, encouraging the model to correctly rank patients according to their risk.\n", " - **Pairwise Comparisons:** It can be viewed as performing pairwise comparisons between patients who experienced the event and those still at risk.\n", - " - **Survival Analysis Suitability:** The loss function aligns with the goals of survival analysis, handling censored data effectively and not requiring assumptions about the baseline hazard function.\n", - "\n", - "\n" + " - **Survival Analysis Suitability:** The loss function aligns with the goals of survival analysis, handling censored data effectively and not requiring assumptions about the baseline hazard function." ] }, { @@ -508,7 +504,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -606,7 +602,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -677,7 +673,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -736,7 +732,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -787,7 +783,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -814,7 +810,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [ { diff --git a/azureml/cxrreportgen/cxr-deploy-batch-endpoint.ipynb b/azureml/cxrreportgen/cxr-deploy-batch-endpoint.ipynb index 1d26565..72aec78 100644 --- a/azureml/cxrreportgen/cxr-deploy-batch-endpoint.ipynb +++ b/azureml/cxrreportgen/cxr-deploy-batch-endpoint.ipynb @@ -273,8 +273,9 @@ } }, "source": [ - "### Load test dataset\n", - "Download the test dataset using command `azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`" + "### Load Test Dataset\n", + "\n", + "Please follow the data download instructions in the main [README](../../README.md) to download the sample data for this notebook." ] }, { @@ -318,14 +319,16 @@ "import base64\n", "import os\n", "\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "\n", "\n", "def read_base64_image(image_path):\n", " with open(image_path, \"rb\") as f:\n", " return base64.b64encode(f.read()).decode(\"utf-8\")\n", "\n", "\n", - "frontal = \"/home/azureuser/data/healthcare-ai/cxrreportgen-images/cxr_frontal.jpg\"\n", - "lateral = \"/home/azureuser/data/healthcare-ai/cxrreportgen-images/cxr_lateral.jpg\"\n", + "frontal = os.path.join(data_root, \"cxrreportgen-images\", \"cxr_frontal.jpg\")\n", + "lateral = os.path.join(data_root, \"cxrreportgen-images\", \"cxr_lateral.jpg\")\n", "\n", "data = [\n", " {\n", @@ -644,7 +647,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "metadata": { "gather": { "logged": 1740945427353 @@ -671,7 +674,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "metadata": { "gather": { "logged": 1740945429953 diff --git a/azureml/cxrreportgen/cxr-deploy.ipynb b/azureml/cxrreportgen/cxr-deploy.ipynb index b70a3fa..45187cb 100644 --- a/azureml/cxrreportgen/cxr-deploy.ipynb +++ b/azureml/cxrreportgen/cxr-deploy.ipynb @@ -324,11 +324,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Download data\n", + "## Download Data\n", "\n", - "Use the following command to download the dataset with samples into your data folder located at `/home/azureuser/data/healthcare-ai/cxrreportgen-images`:\n", - "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/cxrreportgen-images /home/azureuser/data/healthcare-ai/`" + "Please follow the data download instructions in the main [README](../../README.md) to download the sample data for this notebook." ] }, { @@ -346,9 +344,12 @@ ], "source": [ "# Now let's pass frontal and lateral images to the model and visualize the results\n", + "import os\n", + "\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", "\n", - "frontal = \"/home/azureuser/data/healthcare-ai/cxrreportgen-images/cxr_frontal.jpg\"\n", - "lateral = \"/home/azureuser/data/healthcare-ai/cxrreportgen-images/cxr_lateral.jpg\"\n", + "frontal = os.path.join(data_root, \"cxrreportgen-images\", \"cxr_frontal.jpg\")\n", + "lateral = os.path.join(data_root, \"cxrreportgen-images\", \"cxr_lateral.jpg\")\n", "\n", "indication = \"\"\n", "technique = \"\"\n", diff --git a/azureml/medimageinsight/adapter-training.ipynb b/azureml/medimageinsight/adapter-training.ipynb index 7544678..2ae76ef 100644 --- a/azureml/medimageinsight/adapter-training.ipynb +++ b/azureml/medimageinsight/adapter-training.ipynb @@ -6,8 +6,7 @@ "source": [ "# A Tutorial on Using MedImageInsight to Train an Adaptor for Chest Pathology Classification\n", "\n", - "\n", - "MedImageInsight is a foundational model suited for a wide variety of medical image analysis tasks. In this tutorial, we will explore how to build a simple classifier for lung pathology classification by training an adapter on top of MedImageInsight embeddings. While MedImageInsight's out-of-the-box or zero-shot capabilities (similar to those explored in the [previous notebook](./zero_shot_classification.ipynb)) are powerful, they may not always be sufficient—especially for unseen pathology classes. By training an adapter, we can achieve superior performance compared to zero-shot classification, at the cost of training a new, but much simpler, classification model.\n", + "MedImageInsight is a foundational model suited for a wide variety of medical image analysis tasks. In this tutorial, we will explore how to build a simple classifier for lung pathology classification by training an adapter on top of MedImageInsight embeddings. While MedImageInsight's out-of-the-box or zero-shot capabilities (similar to those explored in the [previous notebook](./zero-shot-classification.ipynb)) are powerful, they may not always be sufficient—especially for unseen pathology classes. By training an adapter, we can achieve superior performance compared to zero-shot classification, at the cost of training a new, but much simpler, classification model.\n", "\n", "This approach leverages the embeddings generated by MedImageInsight and adds a simple classification layer to more effectively align pathology findings in chest X-rays. The new classification model is orders of magnitude simpler than training a model from scratch and can often be trained on a CPU.\n", "\n", @@ -15,22 +14,16 @@ "\n", "## Prerequisites\n", "\n", - "Before proceeding with the tutorial, you need to perform some initial setup.\n", - "\n", - "### Online Endpoint Deployment\n", - "The MedImageInsight Model is accessed and deployed through Azure AI Model Catalog. Alternatively, you can deploy the model programmatically, as described in the deployment notebook.\n", - "\n", - "### Dataset\n", - "For this tutorial, we provide a sample dataset containing 100 2D X-Ray dicom images. Please download the data using the following command:\n", - "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", - "Additionally, we provide categorical labels for different lung pathologies for each image. This setup will allow us to evaluate the zero-shot classification performance effectively.\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", "\n", - "### Environment\n", + "### Required for This Notebook\n", "\n", - "1. Please install the healthcareai_toolkit package by using the from the root of the the repository: `pip install -e package`\n", - "2. Setup your .env file with `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT`\n", "\n", "## Adapter Training Overview\n", "In this tutorial, we will guide you through the process of using the embeddings generated by MedImageInsight to classify unseen chest pathologies in chest X-rays. We will demonstrate how to train an adapter model to improve classification performance. The steps we'll cover are:\n", @@ -56,15 +49,14 @@ "\n", "3. **Evaluate Findings Classification Accuracy with Fine-Tuned Embeddings**\n", " - We perform inference to generate predictions using the adapter model we have just trained.\n", - " - We directly predict categorical labels using MedImageInsight embeddings and the adapter MLP and compare to the ground-truth findings of each subject for evaluating accuracy\n", - "\n" + " - We directly predict categorical labels using MedImageInsight embeddings and the adapter MLP and compare to the ground-truth findings of each subject for evaluating accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## 1. Set up and data preparation" + "## 1. Setup and Imports" ] }, { diff --git a/azureml/medimageinsight/advanced-call-example.ipynb b/azureml/medimageinsight/advanced-call-example.ipynb index b99a28b..43f28d2 100644 --- a/azureml/medimageinsight/advanced-call-example.ipynb +++ b/azureml/medimageinsight/advanced-call-example.ipynb @@ -34,7 +34,7 @@ "\n", "### Download data\n", "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n", + "Please follow the data download instructions in the main [README](../../README.md) to download the sample data for this notebook.\n", "\n", "### Install Required Packages\n", "\n", @@ -398,9 +398,13 @@ "metadata": {}, "outputs": [], "source": [ + "import os\n", + "\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "\n", "filelist = list(\n", " glob.glob(\n", - " \"/home/azureuser/data/healthcare-ai/medimageinsight-zeroshot/**/*.dcm\",\n", + " os.path.join(data_root, \"medimageinsight-zeroshot\", \"**\", \"*.dcm\"),\n", " recursive=True,\n", " )\n", ")\n", diff --git a/azureml/medimageinsight/agent-classification-example.ipynb b/azureml/medimageinsight/agent-classification-example.ipynb new file mode 100644 index 0000000..593a388 --- /dev/null +++ b/azureml/medimageinsight/agent-classification-example.ipynb @@ -0,0 +1,628 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6d3f4f57", + "metadata": {}, + "source": [ + "# Building an AI Agent for Medical Image Classification with MedImageInsight\n", + "\n", + "This tutorial demonstrates how to build an intelligent conversational agent that can classify medical images using MedImageInsight embeddings. The agent uses Semantic Kernel to coordinate natural language interactions with image classification capabilities, allowing users to ask questions about medical images in plain language.\n", + "\n", + "## Prerequisites\n", + "\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", + "\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", + "\n", + "### Required for This Notebook\n", + "\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`\n", + "\n", + "- **Additional Dependencies**: \n", + " ```bash\n", + " pip install semantic-kernel\n", + " ```\n", + "\n", + "## Architecture\n", + "\n", + "The agent architecture consists of several key components:\n", + "\n", + "1. **DataAccess**: Manages storage and retrieval of images by unique IDs\n", + "2. **ChatContext**: Coordinates conversation flow and data management\n", + "3. **ImageClassificationPlugin**: Integrates MedImageInsight for classification\n", + "4. **ChatCompletionAgent**: Orchestrates LLM-based natural language understanding\n", + "\n", + "The workflow is as follows:\n", + "- User sends a message with an attached image\n", + "- Image is stored and assigned a unique ID\n", + "- Agent processes the natural language query\n", + "- Agent calls the classification plugin with extracted labels\n", + "- Plugin computes embeddings and similarity scores\n", + "- Results are returned to the user in natural language" + ] + }, + { + "cell_type": "markdown", + "id": "f6de260c", + "metadata": {}, + "source": [ + "## 1. Setup and Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9a99bff", + "metadata": {}, + "outputs": [], + "source": [ + "from semantic_kernel import Kernel\n", + "from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion\n", + "from semantic_kernel.functions import kernel_function\n", + "from semantic_kernel.agents import ChatCompletionAgent\n", + "from typing import Annotated\n", + "import uuid" + ] + }, + { + "cell_type": "markdown", + "id": "0d06dacf", + "metadata": {}, + "source": [ + "## 2. Chat Context Management and Data Access Layer\n", + "\n", + "The `ChatContext` and `DataAccess` classes form the architectural foundation of our agent system. Together, they solve a fundamental challenge: **how to coordinate stateful conversations with binary image data when agents can only pass text between function calls**.\n", + "\n", + "### The Architecture Challenge\n", + "\n", + "When building conversational agents with Semantic Kernel, all function parameters must be serializable to JSON for the language model to process. This creates a problem for medical imaging applications:\n", + "- Images are binary data (bytes) that cannot be directly passed through text-based interfaces\n", + "- Multiple function calls need access to the same image data\n", + "- Expensive resources (like the MI2 client) should be initialized once and shared across all interactions\n", + "\n", + "**Important:** This is **not** a typical multi-modal approach where images are sent directly to the LLM for vision-based understanding. Instead, images are routed to the MedImageInsight classification tool. The LLM only processes text—it understands the user's intent and extracts classification labels, then calls the appropriate function with an image ID reference.\n", + "\n", + "### The Solution: Context + Data Access Pattern\n", + "\n", + "We separate concerns into two complementary classes:\n", + "\n", + "**`DataAccess`** - Handles the storage/retrieval problem:\n", + "1. Stores image bytes in memory (or database/blob storage in production)\n", + "2. Generates unique text identifiers (UUIDs) for each image\n", + "3. Allows functions to retrieve images using these text IDs\n", + "4. Provides state management (list, store, retrieve operations)\n", + "\n", + "**`ChatContext`** - Handles the orchestration problem:\n", + "1. Coordinates the three critical components: DataAccess, MI2 Client, and Agent\n", + "2. Maintains shared state across the entire conversation\n", + "3. Provides a high-level interface (`send_message`) that abstracts complexity\n", + "4. Handles image format conversion (file paths, PIL Images, bytes) automatically\n", + "\n", + "### How They Work Together\n", + "\n", + "The workflow follows these steps:\n", + "\n", + "1. **User sends image** → `ChatContext.send_message()` receives the image and message\n", + "2. **Image conversion** → ChatContext converts the image to bytes (from file path, PIL Image, etc.)\n", + "3. **Image storage** → `DataAccess.store_image()` stores the bytes and generates a unique ID (e.g., \"abc123\")\n", + "4. **Message preparation** → ChatContext appends the image ID to the user's message: `\"Is this pneumonia?\\nImage ID: abc123\"`\n", + "5. **Agent processing** → The Agent (LLM) receives only text—no binary data. It understands the question and extracts labels\n", + "6. **Function call** → Agent invokes `classify_image(labels=[\"pneumonia\", \"normal\"], image_id=\"abc123\")`\n", + "7. **Image retrieval** → Plugin calls `DataAccess.get_image(\"abc123\")` to retrieve the actual image bytes\n", + "8. **Classification** → Plugin sends bytes to MedImageInsight for embedding and similarity calculation\n", + "9. **Results** → Probabilities are returned to the agent, which formats them for the user\n", + "\n", + "**Key point:** The image travels through a separate path (ChatContext → DataAccess → Plugin → MI2) while the LLM only sees text references (image IDs) and orchestrates the workflow.\n", + "\n", + "### Production Considerations\n", + "\n", + "In production systems, replace the in-memory `DataAccess._images` dictionary with:\n", + "- Azure Blob Storage for large-scale deployments\n", + "- Redis or another caching layer for distributed systems\n", + "- Database storage with proper indexing for audit trails\n", + "\n", + "The architectural pattern remains the same—only the storage backend changes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f64c7fd1", + "metadata": {}, + "outputs": [], + "source": [ + "class DataAccess:\n", + " \"\"\"Stores and retrieves images by ID.\"\"\"\n", + "\n", + " def __init__(self):\n", + " self._images = {}\n", + "\n", + " def store_image(self, image: bytes) -> str:\n", + " \"\"\"Store an image and return its ID.\"\"\"\n", + " image_id = str(uuid.uuid4())[:6]\n", + " self._images[image_id] = image\n", + " return image_id\n", + "\n", + " def get_image(self, image_id: str) -> bytes:\n", + " \"\"\"Retrieve an image by ID.\"\"\"\n", + " if image_id in self._images:\n", + " return self._images.get(image_id)\n", + " print(f\"Image ID {image_id} not found!\")\n", + "\n", + " def list_images(self):\n", + " \"\"\"List all stored image IDs.\"\"\"\n", + " return list(self._images.keys())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07e59edf", + "metadata": {}, + "outputs": [], + "source": [ + "from healthcareai_toolkit.clients import MedImageInsightClient\n", + "from typing import Union\n", + "from pathlib import Path\n", + "from PIL import Image\n", + "from IPython.display import display\n", + "import io\n", + "\n", + "\n", + "def print_message(header, message):\n", + " agent_header = f\" {header.strip()} \"\n", + " print(f\"{agent_header:-^50}\")\n", + " print(message)\n", + " print(\"-\" * 50)\n", + " print()\n", + "\n", + "\n", + "class ChatContext:\n", + " \"\"\"Manages conversation context and data access.\"\"\"\n", + "\n", + " def __init__(self, data_access=None, mi2_client=None):\n", + " self.agent = None\n", + " self.data_access = data_access or DataAccess()\n", + " self.mi2_client = mi2_client or MedImageInsightClient()\n", + "\n", + " def set_agent(self, agent: ChatCompletionAgent):\n", + " \"\"\"Set the agent after initialization.\"\"\"\n", + " self.agent = agent\n", + "\n", + " async def send_message(\n", + " self, message: str, image: Union[bytes, str, Path, Image.Image] = None\n", + " ):\n", + " \"\"\"Send a message with optional image to the agent.\n", + "\n", + " Args:\n", + " message: The text message to send\n", + " image: Can be bytes, file path (str/Path), or PIL Image\n", + " \"\"\"\n", + "\n", + " if self.agent is None:\n", + " raise ValueError(\"Agent not set. Call set_agent() first.\")\n", + "\n", + " if image is not None:\n", + " # Convert image to bytes\n", + " if isinstance(image, (str, Path)):\n", + " with open(image, \"rb\") as f:\n", + " image_bytes = f.read()\n", + " elif isinstance(image, Image.Image):\n", + " buffer = io.BytesIO()\n", + " image.save(buffer, format=image.format or \"PNG\")\n", + " image_bytes = buffer.getvalue()\n", + " elif isinstance(image, bytes):\n", + " image_bytes = image\n", + " else:\n", + " raise TypeError(f\"Unsupported image type: {type(image)}\")\n", + "\n", + " display_image = Image.open(io.BytesIO(image_bytes))\n", + " display(display_image)\n", + "\n", + " # Store image and get ID\n", + " image_id = self.data_access.store_image(image_bytes)\n", + " full_message = f\"{message}\\nImage ID: {image_id}\"\n", + " else:\n", + " full_message = message\n", + "\n", + " # Print user message\n", + " print_message(\"User\", full_message)\n", + "\n", + " # Send to agent\n", + " agent_header = f\" Agent ({self.agent.name}) \"\n", + " async for response in self.agent.invoke(full_message):\n", + " print_message(agent_header, response)" + ] + }, + { + "cell_type": "markdown", + "id": "d22a9f38", + "metadata": {}, + "source": [ + "## 4. Initialize Semantic Kernel\n", + "\n", + "Here we initialize the Semantic Kernel and configure it with Azure OpenAI services. The kernel acts as the central orchestrator for our agent, managing:\n", + "- AI service connections\n", + "- Plugin registration\n", + "- Function execution\n", + "\n", + "We load configuration from the environment settings managed by the healthcare AI toolkit." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "adfa4f38", + "metadata": {}, + "outputs": [], + "source": [ + "from healthcareai_toolkit import settings\n", + "\n", + "# Initialize kernel\n", + "kernel = Kernel()\n", + "\n", + "# Add Azure OpenAI service\n", + "kernel.add_service(\n", + " AzureChatCompletion(\n", + " endpoint=settings.AZURE_OPENAI_ENDPOINT,\n", + " api_key=settings.AZURE_OPENAI_API_KEY,\n", + " deployment_name=settings.AZURE_OPENAI_DEPLOYMENT_NAME,\n", + " api_version=settings.AZURE_OPENAI_API_VERSION,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f14c98ea", + "metadata": {}, + "source": [ + "## 5. Image Classification Plugin\n", + "\n", + "The `ImageClassificationPlugin` is the core component that integrates MedImageInsight with the agent framework. It performs zero-shot classification using the same embedding-based approach described in the [zero-shot classification tutorial](./zero-shot-classification.ipynb).\n", + "\n", + "### How It Works\n", + "\n", + "1. **Embedding Generation**:\n", + " - `_get_image_embeddings()`: Converts image bytes to feature vectors using MedImageInsight\n", + " - `_get_text_embeddings()`: Converts text labels to feature vectors\n", + "\n", + "2. **Similarity Calculation**:\n", + " - Computes dot product between image and text embeddings\n", + " - Applies learned temperature scaling factor\n", + " - Uses softmax to convert logits to probabilities\n", + "\n", + "3. **Agent Integration**:\n", + " - The `@kernel_function` decorator exposes `classify_image()` to the agent\n", + " - Type annotations provide clear parameter documentation for the LLM\n", + " - Returns a dictionary mapping categories to probability scores" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e88a6ecb", + "metadata": {}, + "outputs": [], + "source": [ + "from typing import List\n", + "import torch\n", + "import numpy as np\n", + "\n", + "\n", + "class ImageClassificationPlugin:\n", + " \"\"\"Plugin that classifies images.\"\"\"\n", + "\n", + " def __init__(self, chat_context: ChatContext):\n", + " self.chat_context = chat_context\n", + " self.data_access = chat_context.data_access\n", + " self.mi2_client = chat_context.mi2_client\n", + " self.scaling_factor = np.atleast_1d(\n", + " self.mi2_client.submit(text_list=[\"placeholder\"])[0][\"scaling_factor\"]\n", + " )\n", + "\n", + " def _get_image_embeddings(self, image_data: bytes):\n", + " \"\"\"Get image embeddings using MedImageInsightClient.\"\"\"\n", + " response = self.mi2_client.submit(image_list=[image_data])\n", + " return np.array(response[0][\"image_features\"][0])\n", + "\n", + " def _get_text_embeddings(self, texts: List[str]):\n", + " \"\"\"Get text embeddings using MedImageInsightClient.\"\"\"\n", + " response = self.mi2_client.submit(text_list=texts)\n", + " return np.array([item[\"text_features\"] for item in response])\n", + "\n", + " def _calculate_probability(\n", + " self, image_features: np.ndarray, text_features: np.ndarray, texts: List[str]\n", + " ):\n", + " \"\"\"Calculate probability scores between image and text embeddings.\"\"\"\n", + "\n", + " logits_per_image = (\n", + " torch.from_numpy(self.scaling_factor).exp()\n", + " * torch.from_numpy(image_features)\n", + " @ torch.from_numpy(text_features).t()\n", + " )\n", + " probs = logits_per_image.softmax(dim=-1).cpu().numpy()\n", + " return {text: float(prob) for text, prob in zip(texts, probs)}\n", + "\n", + " @kernel_function()\n", + " def classify_image(\n", + " self,\n", + " labels: Annotated[list[str], \"List of category labels\"],\n", + " image_id: Annotated[str, \"Image ID to classify\"],\n", + " ) -> Annotated[dict, \"Categories mapped to probability scores\"]:\n", + " \"\"\"Classifies an image against a list of categories and returns probabilities as a dictionary with the input labels as keys.\"\"\"\n", + "\n", + " # Get image from data access\n", + " image_data = self.data_access.get_image(image_id)\n", + "\n", + " if image_data is None:\n", + " return {\"error\": f\"Image {image_id} not found\"}\n", + "\n", + " print_message(\n", + " \"Tool (classify_image)\",\n", + " f\"Classifying image {image_id} ({len(image_data)} bytes).\\nLabels: {labels}\",\n", + " )\n", + "\n", + " image_features = self._get_image_embeddings(image_data)\n", + " text_features = self._get_text_embeddings(labels)\n", + "\n", + " # Calculate probabilities\n", + " probabilities = self._calculate_probability(\n", + " image_features, text_features, labels\n", + " )\n", + "\n", + " return probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "527719ba", + "metadata": {}, + "source": [ + "## 6. Create and Configure the Agent\n", + "\n", + "Now we bring all the components together to create our medical image classification agent.\n", + "\n", + "### Agent Instructions\n", + "\n", + "The agent is configured with detailed instructions that define its behavior:\n", + "- How to extract and format classification labels from user queries\n", + "- When and how to call the classification plugin\n", + "- How to present results to users\n", + "- Example interactions to guide the agent's responses\n", + "\n", + "#### Label Format\n", + "\n", + "For optimal results, text labels should follow this hierarchical structure:\n", + "```\n", + " \n", + "```\n", + "\n", + "Examples:\n", + "- \"x-ray chest anteroposterior pneumonia\"\n", + "- \"computed tomography chest axial mass\"\n", + "- \"magnetic resonance imaging brain sagittal tumor\"\n", + "\n", + "### Initialization Steps\n", + "\n", + "1. Create the chat context\n", + "2. Instantiate the classification plugin with the context\n", + "3. Register the plugin with the kernel\n", + "4. Create the agent with the kernel and instructions\n", + "5. Link the agent back to the context\n", + "\n", + "This design allows for clean separation of concerns while maintaining necessary connections between components." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8113bc0", + "metadata": {}, + "outputs": [], + "source": [ + "context = ChatContext()\n", + "\n", + "plugin = ImageClassificationPlugin(context)\n", + "# Add plugin to kernel with context's data access\n", + "kernel.add_plugin(plugin, plugin_name=\"ImageClassifier\")\n", + "\n", + "# Create agent\n", + "agent = ChatCompletionAgent(\n", + " kernel=kernel,\n", + " name=\"ImageClassificationAgent\",\n", + " instructions=\"\"\"\n", + "You are a medical image classification assistant that helps classify medical images.\n", + "\n", + "When a user provides a message with an Image ID:\n", + "1. Extract the categories from their message as a list of strings\n", + "2. Each category should follow increasing specificity when possible: \n", + "3. If any component is not available or not specified, leave it out\n", + "4. Call classify_image with the labels and the image_id\n", + "5. Report the probabilities to the user in a clear format\n", + "\n", + "Label Format Examples:\n", + "- \"x-ray chest anteroposterior atelectasis\"\n", + "- \"x-ray chest anteroposterior pneumonia\"\n", + "- \"computed tomography chest axial mass\"\n", + "- \"magnetic resonance imaging knee sagittal torn meniscus\"\n", + "- \"histopathology H&E stain sentinel lymph node malignant\"\n", + "- \"retinal fundus pathological myopia\"\n", + "- \"dermatology clinical photography angular cheilitis\"\n", + "- \"x-ray chest pneumonia\" (when view information not specified)\n", + "- \"chest x-ray\" (when only modality and body part known)\n", + "\n", + "If the user provides abbreviated or informal labels (e.g., \"CT chest with mass\", \"MRI brain tumor\"), expand them.\n", + "If the user asks general questions, help them formulate appropriate medical imaging labels.\n", + "\n", + "Example:\n", + "User: \"is this atelectasis or pneumonia?\\nImage ID: abc123\"\n", + "You: Call classify_image(labels=[\"x-ray chest anteroposterior atelectasis\", \"x-ray chest anteroposterior pneumonia\"], image_id=\"abc123\")\n", + "\"\"\".strip(),\n", + ")\n", + "\n", + "# Set the agent in the context\n", + "context.set_agent(agent)" + ] + }, + { + "cell_type": "markdown", + "id": "6b64767c", + "metadata": {}, + "source": [ + "## 7. Example: Classify a Medical Image\n", + "\n", + "Now let's test our agent with a real medical image. In this example, we'll:\n", + "1. Load a chest X-ray image\n", + "2. Display it for visual reference\n", + "3. Ask the agent to classify it among several possible conditions\n", + "\n", + "The agent will:\n", + "- Understand the natural language query\n", + "- Extract the relevant classification labels\n", + "- Call the classification plugin\n", + "- Return probability scores for each condition\n", + "\n", + "### Try Your Own Images\n", + "\n", + "You can easily adapt this example to classify your own medical images by:\n", + "- Changing the `input_image` path\n", + "- Modifying the question to include relevant conditions\n", + "- Using different medical imaging modalities (CT, MRI, etc.)\n", + "\n", + "### Understanding the Results\n", + "\n", + "The agent will return probability scores for each label. Higher probabilities indicate stronger similarity between the image and that particular condition description in the MedImageInsight embedding space." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "618e7f94", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "input_image = os.path.join(settings.DATA_ROOT, \"segmentation-examples/covid_1585.png\")\n", + "\n", + "# Send message with image\n", + "await context.send_message(\n", + " message=\"Is this a chest x-ray showing COVID-19, pneumonia, or atelectasis?\",\n", + " image=input_image,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19b9b93f", + "metadata": {}, + "outputs": [], + "source": [ + "input_image = os.path.join(\n", + " settings.DATA_ROOT,\n", + " \"medimageinsight-outlier-detection/samples/test/outlier/CT/1.3.6.1.4.1.55648.010293352392778028677215985701318018213/1.3.6.1.4.1.55648.010293352392778028677215985701318018213.3.png\",\n", + ")\n", + "message = (\n", + " \"Could you tell me what modality this image is (CT, MRI, XRAY, mammography)\\n\"\n", + " + \"and what body part (extremity, chest, abdomen, brain, spine, pelvis, breast)?\"\n", + ")\n", + "\n", + "# Send message with image\n", + "await context.send_message(message=message, image=input_image)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b396eec1", + "metadata": {}, + "outputs": [], + "source": [ + "input_image = os.path.join(\n", + " settings.DATA_ROOT,\n", + " \"medimageinsight-outlier-detection/samples/test/outlier/MR/1.3.6.1.4.1.55648.002676776301544845833524448635393145729/1.3.6.1.4.1.55648.002676776301544845833524448635393145729.502.png\",\n", + ")\n", + "\n", + "# Send message with image\n", + "await context.send_message(message=message, image=input_image)" + ] + }, + { + "cell_type": "markdown", + "id": "36ce5a3a", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "This tutorial demonstrated how to build an AI agent that combines natural language understanding with medical image classification using MedImageInsight embeddings. By leveraging Semantic Kernel, we created an intuitive conversational interface that allows users to ask questions about medical images in plain language without needing to understand the underlying technical implementation.\n", + "\n", + "The key advantage of this approach is flexibility—the agent can adapt to different classification tasks through natural conversation, making it valuable for:\n", + "- **Rapid prototyping** of classification workflows without writing new code\n", + "- **Clinical decision support** where physicians can ask specific diagnostic questions\n", + "- **Research exploration** to quickly test hypotheses across different pathology categories\n", + "- **Educational applications** where students can learn through interactive questioning\n", + "\n", + "## Next Steps\n", + "\n", + "### Creating Different Types of Agents\n", + "\n", + "Now that you understand the fundamentals, you can create different types of specialized agents by combining concepts from other notebooks:\n", + "\n", + "**Adapter-Enhanced Classification Agents** - Use the [adapter training approach](./adapter-training.ipynb) to build agents with fine-tuned classification layers on top of embeddings. This can significantly boost accuracy for specific tasks like pneumonia detection or tumor classification while maintaining the conversational interface.\n", + "\n", + "**Segmentation + Classification Agents** - Integrate [MedImageParse](../medimageparse/medimageparse_segmentation_demo.ipynb) to create agents that provide both classification and visual localization. These agents could answer questions like \"Where is the lesion?\" by highlighting affected regions alongside classification probabilities.\n", + "\n", + "**Quality Control Agents** - Incorporate the [outlier detection methods](./outlier-detection-demo.ipynb) to create agents that flag unusual cases, detect acquisition problems, or identify studies that deviate from expected protocols—essential for maintaining data quality in clinical and research settings.\n", + "\n", + "**Multi-Modal Diagnostic Agents** - Combine multiple data types using patterns from the [advanced demos](../advanced_demos/). For example, build agents that analyze radiology images, pathology slides, and clinical notes together for comprehensive diagnostic support in complex cases like cancer staging.\n", + "\n", + "### Multi-Agent Orchestration with Healthcare Agent Orchestrator\n", + "\n", + "For production scenarios requiring coordination between multiple specialized agents, explore the [**Healthcare Agent Orchestrator**](https://github.com/Azure-Samples/healthcare-agent-orchestrator/) (HAO) project. HAO is a multi-agent framework specifically designed for complex healthcare workflows where different agents need to collaborate and share context.\n", + "\n", + "HAO demonstrates how to:\n", + "- **Coordinate multiple specialized agents** working together on complex tasks like cancer care coordination\n", + "- **Integrate with Microsoft Teams** for real-time collaboration between AI agents and care teams\n", + "- **Work across diverse data types** including imaging, pathology, clinical notes, and structured data\n", + "- **Build modular, scalable solutions** where agents can be added or modified without disrupting the overall system\n", + "- **Connect with enterprise systems** like Copilot Studio through Microsoft Cloud for Healthcare\n", + "\n", + "This is particularly valuable for multi-disciplinary scenarios where a classification agent might work alongside report generation agents, scheduling agents, and care coordination agents—all collaborating to support the clinical team.\n", + "\n", + "### Related Notebooks\n", + "\n", + "- [Zero-Shot Classification](./zero-shot-classification.ipynb) - Deep dive into the embedding-based classification approach\n", + "- [Adapter Training](./adapter-training.ipynb) - Fine-tune for improved accuracy on specific tasks\n", + "- [Outlier Detection](./outlier-detection-demo.ipynb) - Identify unusual cases automatically\n", + "- [Advanced Demos](../advanced_demos/) - Multi-modal and complex healthcare applications" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "haitk-py310", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.18" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/azureml/medimageinsight/exam-parameter-demo/exam-parameter-detection.ipynb b/azureml/medimageinsight/exam-parameter-demo/exam-parameter-detection.ipynb index c25e40a..03f7108 100644 --- a/azureml/medimageinsight/exam-parameter-demo/exam-parameter-detection.ipynb +++ b/azureml/medimageinsight/exam-parameter-demo/exam-parameter-detection.ipynb @@ -33,44 +33,37 @@ "id": "c8d1a846", "metadata": {}, "source": [ - "# 1. Instructions to Reproduce the Notebook\n", + "## 1. Setup and Imports\n", "\n", "## Prerequisites\n", "\n", - "Before proceeding with the tutorial, you need to perform some initial setup.\n", - "### Online Endpoint Deployment\n", - "The MedImageInsight Model is accessed and deployed through Azure AI Model Catalog or Azure Machine Learning Model Catalog. Alternatively, you can deploy the model programmatically, as described in the deployment notebook.\n", - "Links:\n", - "- [Documentation](https://aka.ms/healthcare-ai-docs-deploy-mi2)\n", - "- [Programmatic Deployment](https://aka.ms/healthcare-ai-examples-mi2-deploy)\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", - "### Dataset\n", - "For this tutorial, we provide a sample dataset containing 100 2D X-Ray dicom images. Please download the data using the following command:\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n", + "### Required for This Notebook\n", "\n", - "Additionally, we provide categorical labels for different lung pathologies for each image. This setup will allow us to evaluate the zero-shot classification performance effectively.\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT`\n", "\n", - "### Environment\n", - "The easiest way to reproduce this notebook is to run it inside the Azure Machine Learning environment.\n", + "### GPT-Based Weak Labeling (Optional)\n", "\n", - " 1. Please install the healthcareai_toolkit package by using the from the root of the the repository: `pip install -e package`\n", - " 2. Setup your .env file with `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n", + "This notebook includes an optional section that uses GPT-4 to create ground truth labels for the dataset. This step takes about 30 minutes to complete. If you want to run it, you will need:\n", "\n", - "1. **Data Assets**: \n", - " We are not providing the original DICOM files, but rather are providing the following:\n", - " - DICOM tags extracted into a CSV file named `data/mri_sample_features-sm.csv`. Each row in this file represents a single MRI series.\n", - " - Embedding vectors and some slices used for visualization are provided in the above sample data. In this data you will find: \n", - " - Embedding vectors serialized as .pkl files in the: `data/feature_vectors` directory\n", - " - .png files of several slices from the original MRI dataset that we use for visualization in the `data/pngs` directory\n", - " \n", - "2. **GPT-Based Weak Labeling**: \n", - " During one of the steps in this notebook we will use GPT4 model to create ground truth labels for our dataset. This step requires a separate deployment of GPT4 and takes about 30 minutes to complete. If you want to run it, you should:\n", - " - Ensure that the GPT-4 endpoint is provisioned for GPT-based weak labeling.\n", - " - Specify the parameters (endpoint and API key) in your `.env` file.\n", - " \n", - " If you cannot run this step, you can use pre-computed labels, load them and continue with the notebook execution. The pre-computed labels are available in the following file in your data directory:\n", - " `gpt_labeled_mri.json`" + "- **Additional Model Endpoint(s)**: `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`\n", + "\n", + "You can also use the pre-computed labels provided in your data directory:\n", + " `gpt_labeled_mri.json`\n", + "\n", + "## Note on Data Assets\n", + "\n", + "We are not providing the original DICOM files, but rather are providing the following:\n", + "- DICOM tags extracted into a CSV file named `data/mri_sample_features-sm.csv`. Each row in this file represents a single MRI series.\n", + "- Embedding vectors and some slices used for visualization are provided in the above sample data. In this data you will find: \n", + " - Embedding vectors serialized as .pkl files in the: `data/feature_vectors` directory\n", + " - .png files of several slices from the original MRI dataset that we use for visualization in the `data/pngs` directory" ] }, { @@ -100,7 +93,7 @@ "id": "5d83818a", "metadata": {}, "source": [ - "# 2. Dataset setup and exploration" + "## 2. Dataset Setup and Exploration" ] }, { @@ -541,7 +534,7 @@ "id": "aa22b07a", "metadata": {}, "source": [ - "# 3. Ontology mapping\n" + "## 3. Ontology Mapping" ] }, { @@ -586,7 +579,7 @@ "source": [ "# We will call this function to submit system prompt and user prompt to our GPT4 deployment\n", "def get_gpt_label(client, obj):\n", - " deployment = settings.AZURE_OPENAI_MODEL_NAME\n", + " deployment = settings.AZURE_OPENAI_DEPLOYMENT_NAME\n", "\n", " response = client.chat.completions.create(\n", " model=deployment,\n", @@ -627,7 +620,9 @@ "# This code will try to load the endpoint and API key from environment.json.\n", "# If you would like to execute the GPT4-based labeling, please provide the values putting the files into your .env file or as environment variables.\n", "# Otherwise you may skip this cell and use the pre-labeled data provided in the repository.\n", - "oai_client = helpers.create_openai_client()" + "from healthcareai_toolkit.clients.openai import create_openai_client\n", + "\n", + "oai_client = create_openai_client()" ] }, { @@ -835,7 +830,7 @@ "id": "af5e4837", "metadata": {}, "source": [ - "# 4. Cluster analysis\n" + "## 4. Cluster Analysis" ] }, { @@ -1294,7 +1289,7 @@ "id": "5d90a4f8", "metadata": {}, "source": [ - "# 5. Test the model" + "## 5. Test the Model" ] }, { @@ -3048,7 +3043,7 @@ "id": "ebf1ac40", "metadata": {}, "source": [ - "# 6. Final Remarks" + "## 6. Final Remarks" ] }, { diff --git a/azureml/medimageinsight/exam-parameter-demo/exam_parameter_helpers.py b/azureml/medimageinsight/exam-parameter-demo/exam_parameter_helpers.py index c4819ef..a75e171 100644 --- a/azureml/medimageinsight/exam-parameter-demo/exam_parameter_helpers.py +++ b/azureml/medimageinsight/exam-parameter-demo/exam_parameter_helpers.py @@ -73,18 +73,6 @@ def create_exam_param_struct_from_dicom_tags(df_item): return json.dumps(exam_params) -def create_openai_client(): - endpoint = settings.AZURE_OPENAI_ENDPOINT - api_key = settings.AZURE_OPENAI_API_KEY - - client = AzureOpenAI( - azure_endpoint=endpoint, - api_key=api_key, - api_version="2024-02-01", - ) - return client - - def create_oai_assistant(client): """Creates assistant to keep track of prior responses""" # Assistant API example: https://github.com/openai/openai-python/blob/main/examples/assistant.py diff --git a/azureml/medimageinsight/finetuning/mi2-finetuning.ipynb b/azureml/medimageinsight/finetuning/mi2-finetuning.ipynb index 7017bce..16a46bb 100644 --- a/azureml/medimageinsight/finetuning/mi2-finetuning.ipynb +++ b/azureml/medimageinsight/finetuning/mi2-finetuning.ipynb @@ -29,9 +29,11 @@ "Go to [Gastrovision GitHub repository](https://github.com/debeshjha/gastrovision) and follow the links to download the data. You can download directly to an AzureML compute with:\n", "\n", "```sh\n", - " wget -O /home/azureuser/data/Gastrovision.zip ''\n", - " unzip /home/azureuser/data/Gastrovision.zip -d /home/azureuser/data\n", - " ls -l /home/azureuser/data/Gastrovision/\n", + "DATA_ROOT=\"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "\n", + "wget -O \"$DATA_ROOT/Gastrovision.zip\" ''\n", + "unzip \"$DATA_ROOT/Gastrovision.zip\" -d \"$DATA_ROOT\"\n", + "ls -l \"$DATA_ROOT/Gastrovision/\"\n", "```\n", "\n" ] @@ -250,7 +252,8 @@ " return f\"endoscopy gastrointestinal {view} {label}\"\n", "\n", "\n", - "gastrovision_root_directory = \"/home/azureuser/data/Gastrovision\"\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "gastrovision_root_directory = os.path.join(data_root, \"Gastrovision\")\n", "text_to_label = {}\n", "folders = os.listdir(gastrovision_root_directory)\n", "\n", @@ -457,7 +460,7 @@ "Once the cluster is created, we can start the fine-tuning job. We'll retrieve the latest version of MedImageInsight model from the catalog and reference our fine-tuning pipeline. Azure ML pipelines is an invaluable feature of Azure Machine Learning product that allows organizing multiple tasks involved in a machine learning job into a structured workflow that can also be visualized inside the studio. See [AzureML documentation](https://learn.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines) for more information on pipelines. One advantage of using pipelines is access to all components. For example, if you only want to train the base model and not create a classification model, you can do that. Fine tuning of MedImageInsight is enabled by several pipeline components that we have developed and published in the Azure ML pipeline registry - you can find them if you navigate to Pipeline Designer inside the Azure ML Studio and look for them in the component selector. \n", "\n", "These components are:\n", - "1. **MedImageInsight Model Finetuning Core Component** [`medimgage_embedding_finetune`](https://ml.azure.com/registries/azureml/components/medimgage_embedding_finetune): \n", + "1. **MedImageInsight Model Finetuning Core Component** [`medimgageinsight_embedding_finetune`](https://ml.azure.com/registries/azureml/components/medimageinsight_embedding_finetune): \n", " This is the engine behind the fine-tuning process, responsible for training the MedImageInsight model.\n", " It also supports distributed training across a multi-GPU cluster. \n", " \n", @@ -467,7 +470,7 @@ " - A text file with a full list of labels.\n", " - A training configuration file.\n", "\n", - "2. **MedImageInsight Embedding Generation Component** [`medical_image_embedding_datapreprocessing`](https://ml.azure.com/registries/azureml/components/medical_image_embedding_datapreprocessing): \n", + "2. **MedImageInsight Embedding Generation Component** [`medimageinsight_embedding_generation`](https://ml.azure.com/registries/azureml/components/medimageinsight_embedding_generation): \n", " This component generates embeddings from images using the MedImageInsight model. It allows you to adjust image quality and dimensions, ultimately outputting a pickled NumPy array that contains embeddings for all processed images.\n", "\n", " Inputs:\n", @@ -476,7 +479,7 @@ " - (Optional) An integer value for JPEG compression ratio for image standardization (`default: 75`)\n", " - (Optional) And integer value to use for image size standardization (`default: 512`)\n", "\n", - "3. **MedImageInsight Adapter Finetune Component** [`medimgage_adapter_finetune`](https://ml.azure.com/registries/azureml/components/medimgage_adapter_finetune): \n", + "3. **MedImageInsight Adapter Finetune Component** [`medimageinsight_adapter_finetune`](https://ml.azure.com/registries/azureml/components/medimageinsight_adapter_finetune): \n", " Designed for classification tasks, this component uses NumPy arrays of training and validation data along with their corresponding text labels (from TSV files) to train a specialized 3-layer model. It is optimized for specific domains while still retaining the core strengths of MI2.\n", "\n", " Inputs:\n", @@ -488,7 +491,7 @@ " - Training Params: Dataloader batch sizes, training (default: 8) and validation (default: 1, min: 1); Dataloader workers, training and validation (default: 2, min: 0); Learning rate (default: 0.0003); Max epochs (default: 10, min: 1).\n", " - Metrics: Track metric (default: acc; supports \"acc\" or \"auc\").\n", "\n", - "4. **MedImageInsight Image Classifier Assembler Component** [`medimage_embedding_adapter_merge`](https://ml.azure.com/registries/azureml/components/medimage_embedding_adapter_merge): \n", + "4. **MedImageInsight Image Classifier Assembler Component** [`medimageinsight_classification_model`](https://ml.azure.com/registries/azureml/components/medimageinsight_classification_model): \n", " This component merges your fine-tuned embedding model with a label file to create a deployable image classifier. It accepts the fine-tuned MI2 embedding model, text labels, and an optional adapter model, packaging them into an MLFlow model that can operate in zero-shot mode or with a custom adapter model.\n", "\n", " Inputs:\n", @@ -504,8 +507,8 @@ " - An optional integer for hidden dimensions (default: 512, min: 1).\n", " - An optional integer for input channels (default: 1024, min: 1).\n", "\n", - "5. **MedImageInsight Pipeline Component** [`medimage_insight_ft_pipeline`](https://ml.azure.com/registries/azureml/components/medimage_insight_ft_pipeline): \n", - " This end-to-end pipeline simplifies the workflow by integrating all the above components. It orchestrates the training, evaluation, and final output of the embedding and classification models. The pipeline essentially combines the functionalities of `medimgage_embedding_finetune` and `medimage_embedding_adapter_merge` into a streamlined process.\n", + "5. **MedImageInsight Pipeline Component** [`medimageinsight_ft_pipeline`](https://ml.azure.com/registries/azureml/components/medimageinsight_ft_pipeline): \n", + " This end-to-end pipeline simplifies the workflow by integrating all the above components. It orchestrates the training, evaluation, and final output of the embedding and classification models. The pipeline essentially combines the functionalities of `medimageinsight_embedding_finetune` and `medimageinsight_classification_model` into a streamlined process.\n", " \n", " Inputs:\n", " - Two training TSVs: image TSV and a text TSV.\n", @@ -515,7 +518,7 @@ "\n", "You can also chain different components together and provide your own data preprocessing or adapter implementations. For more details, refer to the [AzureML documentation](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-pipeline-component?view=azureml-api-2&tabs=python).\n", "\n", - "Here we are using **MedImageInsight Pipeline Component** (`medimage_insight_ft_pipeline`) so you should see a job like this:\n", + "Here we are using **MedImageInsight Pipeline Component** (`medimageinsight_ft_pipeline`) so you should see a job like this:\n", "![image.png](attachment:image.png)\n" ] }, @@ -668,7 +671,7 @@ "\n", "# Get the pipeline component\n", "finetune_pipline_component = ml_registry.components.get(\n", - " name=\"medimage_insight_ft_pipeline\", label=\"latest\"\n", + " name=\"medimageinsight_ft_pipeline\", label=\"latest\"\n", ")\n", "print(\n", " \"Component loaded\",\n", @@ -680,7 +683,7 @@ "model = ml_registry.models.get(name=\"MedImageInsight\", label=\"latest\")\n", "\n", "\n", - "@pipeline(name=\"medimage_insight_ft_pipeline_job\" + str(random.randint(0, 100000)))\n", + "@pipeline(name=\"medimageinsight_ft_pipeline_job\" + str(random.randint(0, 100000)))\n", "def create_pipeline():\n", " mi2_pipeline = finetune_pipline_component(\n", " mlflow_embedding_model_path=model.id,\n", @@ -778,7 +781,8 @@ "credentials = DefaultAzureCredential()\n", "ml_client = MLClient.from_config(credentials)\n", "\n", - "gastrovision_root_directory = \"/home/azureuser/data/Gastrovision\"\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "gastrovision_root_directory = os.path.join(data_root, \"Gastrovision\")\n", "name = \"gastrovision\"\n", "if \"pipeline_job_run_id\" not in locals():\n", " ## Retrieved by checking the json of the parent job in AzureML studio (under \"See all properties\") or in output of the cell where you started the job under \"Name\".\n", diff --git a/azureml/medimageinsight/mi2-deploy-batch-endpoint.ipynb b/azureml/medimageinsight/mi2-deploy-batch-endpoint.ipynb index 8a105ad..fbcd4d8 100644 --- a/azureml/medimageinsight/mi2-deploy-batch-endpoint.ipynb +++ b/azureml/medimageinsight/mi2-deploy-batch-endpoint.ipynb @@ -131,7 +131,7 @@ }, "outputs": [], "source": [ - "compute_name = \"mii-batch-cluster\"\n", + "compute_name = \"mi2-batch-cluster\"\n", "if not any(filter(lambda m: m.name == compute_name, ml_workspace.compute.list())):\n", " compute_cluster = AmlCompute(\n", " name=compute_name,\n", @@ -183,7 +183,7 @@ "import random\n", "import string\n", "\n", - "endpoint_prefix = \"mii-batch\"\n", + "endpoint_prefix = \"mi2-batch\"\n", "endpoint_list = list(\n", " filter(\n", " lambda m: m.name.startswith(endpoint_prefix),\n", @@ -235,7 +235,7 @@ "outputs": [], "source": [ "deployment = ModelBatchDeployment(\n", - " name=\"mii-dpl\",\n", + " name=\"mi2-dpl\",\n", " description=\"A deployment for model MedImageInsight\",\n", " endpoint_name=endpoint.name,\n", " model=model,\n", @@ -301,9 +301,9 @@ } }, "source": [ - "### Load sample dataset\n", + "### Load Sample Dataset\n", "\n", - "Download the sample dataset using command `azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n" + "Please follow the data download instructions in the main [README](../../README.md) to download the sample data for this notebook." ] }, { @@ -317,8 +317,10 @@ "outputs": [], "source": [ "import glob\n", + "import os\n", "\n", - "root_dir = \"/home/azureuser/data/healthcare-ai/medimageinsight-examparameter/pngs\"\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "root_dir = os.path.join(data_root, \"medimageinsight-examparameter\", \"pngs\")\n", "\n", "png_files = glob.glob(f\"{root_dir}/**/*.png\", recursive=True)\n", "print(f\"Found {len(png_files)} PNG files\")" diff --git a/azureml/medimageinsight/outlier-detection-demo.ipynb b/azureml/medimageinsight/outlier-detection-demo.ipynb index 876ea60..177ab12 100644 --- a/azureml/medimageinsight/outlier-detection-demo.ipynb +++ b/azureml/medimageinsight/outlier-detection-demo.ipynb @@ -49,33 +49,16 @@ "source": [ "## Prerequisites\n", "\n", - "Before proceeding with the tutorial, you need to perform some initial setup.\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", - "### Azure ML Environment\n", - "To reproduce the notebook and run the Outlier Detection Demo, use the Azure Machine Learning environment. This provides a streamlined setup and execution experience.\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", "\n", - "### Dataset\n", - "The sample dataset includes medical imaging data and embeddings required for the demo. Please download the dataset using the following command:\n", + "### Required for This Notebook\n", "\n", - "```bash\n", - "azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/\n", - "```\n", - "\n", - "Organize the downloaded data as follows:\n", - "- **DICOM Files**: Located in the `dicoms` directory under the `data` folder.\n", - "- **Embeddings File**: A file named `embeddings.jsonl` where embeddings will be saved post-processing.\n", - "\n", - "Ensure the files are correctly placed and accessible in the `data` directory.\n", - "\n", - "### Environment\n", - "\n", - "1. Install the `healthcareai_toolkit` package using the following command:\n", - " ```bash\n", - " pip install -e package\n", - " ```\n", - "2. Configure a `.env` file with at least the following parameters:\n", - " - `DATA_ROOT`: Path to your dataset directory.\n", - " - `MI2_MODEL_ENDPOINT`: Endpoint URL for MedImageInsight.\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT`\n", "\n", "## Outlier Detection Demo Overview\n", "This tutorial will guide you through the steps to perform outlier detection using the MedImageInsight embedding model. Here are the steps we will perform:\n", @@ -94,14 +77,14 @@ " - Use the generated embeddings to identify images with abnormal patterns or outliers in the dataset.\n", "\n", "5. **Visualize and Interpret Results**\n", - " - Visualize the identified outliers to understand the patterns and validate the effectiveness of the outlier detection process.\n" + " - Visualize the identified outliers to understand the patterns and validate the effectiveness of the outlier detection process." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## 2. Setup" + "## 1. Setup and Imports" ] }, { @@ -158,8 +141,7 @@ ], "source": [ "# Initialize the client\n", - "endpoint = settings.MI2_MODEL_ENDPOINT\n", - "client = MedImageInsightClient(endpoint)" + "client = MedImageInsightClient()" ] }, { @@ -181,7 +163,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -251,7 +233,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -425,7 +407,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -538,7 +520,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -581,7 +563,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -645,7 +627,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -691,7 +673,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -744,7 +726,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -842,7 +824,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -945,7 +927,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -961,7 +943,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1028,7 +1010,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1051,7 +1033,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [ { diff --git a/azureml/medimageinsight/zero-shot-classification.ipynb b/azureml/medimageinsight/zero-shot-classification.ipynb index 0796d7f..1a55a0f 100644 --- a/azureml/medimageinsight/zero-shot-classification.ipynb +++ b/azureml/medimageinsight/zero-shot-classification.ipynb @@ -12,25 +12,22 @@ "\n", "## Prerequisites\n", "\n", - "Before proceeding with the tutorial, you need to perform some initial setup.\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", - "### Online Endpoint Deployment\n", - "The MedImageInsight Model is accessed and deployed through Azure AI Model Catalog or Azure Machine Learning Model Catalog. Alternatively, you can deploy the model programmatically, as described in the deployment notebook.\n", - "Links:\n", - "- [Documentation](https://aka.ms/healthcare-ai-docs-deploy-mi2)\n", - "- [Programmatic Deployment](https://aka.ms/healthcare-ai-examples-mi2-deploy)\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", "\n", - "### Dataset\n", - "For this tutorial, we provide a sample dataset containing 100 2D X-Ray dicom images. Please download the data using the following command:\n", + "### Required for This Notebook\n", "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n", + "- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT` \n", + "- **Additional Dependencies**: \n", + " ```bash\n", + " conda install -c pytorch faiss-cpu\n", + " ```\n", "\n", - "Additionally, we provide categorical labels for different lung pathologies for each image. This setup will allow us to evaluate the zero-shot classification performance effectively.\n", - "\n", - "### Environment\n", - "\n", - " 1. Please install the healthcareai_toolkit package by using the from the root of the the repository: `pip install -e package`\n", - " 2. Setup your .env file with `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n", + "> **Note**: [FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, making it perfect for building scalable image search systems.\n", "\n", "## Zero-Shot Classification Overview\n", "This tutorial will walk you through the steps of using the MedImageInsight embedding model to compute embeddings of an image collection and then classify these images using a set of predefined classes. Here are the steps we will perform:\n", @@ -45,19 +42,19 @@ "\n", "3. **Visualize Images with the Corresponding Zero-Shot Prediction**\n", " - Qualitative representations provide an alternative way to assess the correctness of the correspondences with the zero-shot predictions.\n", - " - We will select four subjects (two with accurate predictions and two with an incorrect prediction) to visualize the image-text correspondence.\n" + " - We will select four subjects (two with accurate predictions and two with an incorrect prediction) to visualize the image-text correspondence." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## 1. Set up and data preparation" + "## 1. Setup and Imports" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -112,7 +109,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -149,6 +146,7 @@ "metadata": {}, "source": [ "## 2. Zero-Shot Classification Inference with Image & Text Embedding Generation\n", + "\n", "This section demonstrates how MedImageInsight performs zero-shot classification by independently computing text and image embeddings. It then uses the dot product and softmax function to obtain probabilities of the image belonging to different classes based on the similarity with the text embeddings." ] }, @@ -167,7 +165,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -190,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -215,7 +213,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -249,7 +247,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -280,7 +278,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -317,7 +315,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -439,7 +437,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [ { diff --git a/azureml/medimageparse/medimageparse_segmentation_demo.ipynb b/azureml/medimageparse/medimageparse_segmentation_demo.ipynb index 07af6b4..4a74d9a 100755 --- a/azureml/medimageparse/medimageparse_segmentation_demo.ipynb +++ b/azureml/medimageparse/medimageparse_segmentation_demo.ipynb @@ -6,8 +6,19 @@ "source": [ "## MedImageParse: A Unified Model for Biomedical Image Analysis\n", "\n", - "## Introduction\n", + "## Prerequisites\n", + "\n", + "This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n", "\n", + "1. Deploying required models\n", + "2. Installing the Healthcare AI Toolkit\n", + "3. Downloading sample data\n", + "4. Configuring your `.env` file\n", + "\n", + "### Required for This Notebook\n", + "\n", + "- **Model Endpoint(s)**: `MIP_MODEL_ENDPOINT`\n", + "## Introduction\n", "\n", "Biomedical image analysis plays a critical role in advancing scientific discoveries across multiple fields, such as **cell biology, pathology, radiology**, and more. However, extracting meaningful insights from medical images presents challenges, especially in tasks such as:\n", "\n", @@ -17,43 +28,15 @@ "\n", "Traditionally, these tasks were treated separately. **MedImageParse** (formerly known as BiomedParse) changes this by unifying segmentation, detection, and recognition into a single model. This unlocks new opportunities for clinicians and researchers, enabling them to focus more on discovery and insights rather than technical complexities.\n", "\n", - "**MedImageParse** is a biomedical foundation model developed in collaboration with **Microsoft Research, Providence Genomics**, and the **Paul G. Allen School of Computer Science and Engineering** at the University of Washington. It is part of the **Microsoft healthcare AI models** initiative.\n" + "**MedImageParse** is a biomedical foundation model developed in collaboration with **Microsoft Research, Providence Genomics**, and the **Paul G. Allen School of Computer Science and Engineering** at the University of Washington. It is part of the **Microsoft healthcare AI models** initiative.\n", + "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Sections Overview\n", - "1. **How to Run from a Deployed Endpoint on Azure**\n", - "2. **Installation Requirements**\n", - "3. **Segmentation Examples**\n", - "4. **References**\n", - "\n", - "# 1. Prerequisites\n", - "\n", - "To run this notebook you will need a dataset and an endpoint. \n", - "\n", - "## Download data\n", - "\n", - "Use the following command to download the dataset with samples into your data folder located at `/home/azureuser/data/healthcare-ai/`:\n", - "\n", - "`azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`\n", - "\n", - "## Deploy and configure an endpoint\n", - "\n", - "To run **MedImageParse** from an Azure-deployed endpoint, you will need:\n", - "\n", - "- **Endpoint URI**\n", - "\n", - "Refer to the [MedImageParse deployment notebook](https://aka.ms/healthcare-ai-examples-mip-deploy) and the documentation pages for endpoint deployment instructions. \n", - "\n", - "Refer to the following code cells for examples of how to set up and perform inference using the **AzureML SDK v2**. Note that authentication method demonstrated here is using basic authentication via the API key and is different from the one we have shown in the \"deploy\" notebook. We include this method to demonstrate a variety of ways in which you can invoke a deployed endpoint.\n", - "\n", - "## **Environment**\n", - "\n", - " 1. Please install the healthcareai_toolkit package by using the from the root of the the repository: `pip install -e package`\n", - " 2. Setup your .env file with `DATA_ROOT` and `MIP_MODEL_ENDPOINT` parameters." + "## 1. Setup and Imports" ] }, { @@ -74,30 +57,7 @@ "from healthcareai_toolkit.clients import MedImageParseClient\n", "from healthcareai_toolkit import settings\n", "\n", - "endpoint = settings.MIP_MODEL_ENDPOINT\n", - "client = MedImageParseClient(endpoint)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 3. Segmentation Examples\n", - "Below are several examples illustrating MedImageParse's prompt-based approach to segmentation. These examples demonstrate its capability to accurately segment various biomedical structures across different imaging modalities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3.1 Lung Nodule Segmentation from Radiological Chest CT DICOM Files" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This example demonstrates a prompt to create a single segmentation mask from a single CT slice. Note that we configure our image pre-processing code to apply \"lung\" window to the image before sending it to the model, for better localization. Note that the model effectively performs detection task alongside a segmentation task." + "client = MedImageParseClient()" ] }, { @@ -183,6 +143,22 @@ " show_image_with_mask(image, mask, title=mask_name, ax=ax, colormap=colormap)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Segmentation Examples\n", + "Below are several examples illustrating MedImageParse's prompt-based approach to segmentation. These examples demonstrate its capability to accurately segment various biomedical structures across different imaging modalities.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.1 Lung Nodule Segmentation from Radiological Chest CT DICOM \n", + "This example demonstrates a prompt to create a single segmentation mask from a single CT slice. Note that we configure our image pre-processing code to apply \"lung\" window to the image before sending it to the model, for better localization. Note that the model effectively performs detection task alongside a segmentation task." + ] + }, { "cell_type": "code", "execution_count": null, @@ -222,7 +198,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.2 Neuroradiological Analysis of Tumor Core, Enhancing, and Non-Enhancing Tumors using MRI T1-Gad" + "### 2.2 Neuroradiological Analysis of Tumor Core, Enhancing, and Non-Enhancing Tumors using MRI T1-Gad" ] }, { @@ -271,7 +247,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.3 Neuroradiological Analysis of Whole Tumor, Tumor Core, and Edema using MRI-FLAIR" + "### 2.3 Neuroradiological Analysis of Whole Tumor, Tumor Core, and Edema using MRI-FLAIR" ] }, { @@ -320,7 +296,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.4 Radiological Analysis of Kidney, Tumor, and Cyst using CT Imaging" + "### 2.4 Radiological Analysis of Kidney, Tumor, and Cyst using CT Imaging" ] }, { @@ -371,7 +347,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 3.5 Echocardiographic Analysis of Left Ventricle and Left Atrium using Ultrasound Imaging" + "### 2.5 Echocardiographic Analysis of Left Ventricle and Left Atrium using Ultrasound Imaging" ] }, { @@ -421,7 +397,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.6 Histopathological Analysis and Cell Phenotyping of Neoplastic, Inflammatory, and Connective Tissue Cells" + "### 2.6 Histopathological Analysis and Cell Phenotyping of Neoplastic, Inflammatory, and Connective Tissue Cells" ] }, { @@ -470,7 +446,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.7 Perform Single Cell Analysis\n", + "### 2.7 Perform Single Cell Analysis\n", "The example below shows how to process MedImageParse output to visualize individual cells by appearance and classification using the output from the previous query." ] }, @@ -516,7 +492,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.8 Radiological Analysis of Left Lung, Right Lung, and COVID-19 Infection" + "### 2.8 Radiological Analysis of Left Lung, Right Lung, and COVID-19 Infection" ] }, { @@ -565,7 +541,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.9 Ophthalmological Analysis of Optic Disc and Optic Cup" + "### 2.9 Ophthalmological Analysis of Optic Disc and Optic Cup" ] }, { @@ -613,7 +589,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.10 Endoscopic Analysis of Polyp" + "### 2.10 Endoscopic Analysis of Polyp" ] }, { @@ -661,7 +637,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.11 Dermatological Analysis of Skin Lesion" + "### 2.11 Dermatological Analysis of Skin Lesion" ] }, { @@ -709,7 +685,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3.12 Ophthalmological Analysis of Edema using OCT" + "### 2.12 Ophthalmological Analysis of Edema using OCT" ] }, { diff --git a/azureml/medimageparse/mip-deploy-batch-endpoint.ipynb b/azureml/medimageparse/mip-deploy-batch-endpoint.ipynb index 6fc14d0..bc39384 100644 --- a/azureml/medimageparse/mip-deploy-batch-endpoint.ipynb +++ b/azureml/medimageparse/mip-deploy-batch-endpoint.ipynb @@ -303,7 +303,8 @@ }, "source": [ "### Load test dataset\n", - "Download the test dataset using command `azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/`" + "\n", + "Please follow the data download instructions in the main [README](../../README.md) to download the sample data for this notebook." ] }, { @@ -317,8 +318,10 @@ "outputs": [], "source": [ "import glob\n", + "import os\n", "\n", - "root_dir = \"/home/azureuser/data/healthcare-ai/medimageinsight-examparameter/pngs\"\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "root_dir = os.path.join(data_root, \"medimageinsight-examparameter\", \"pngs\")\n", "\n", "png_files = glob.glob(f\"{root_dir}/**/*.png\", recursive=True)\n", "print(f\"Found {len(png_files)} PNG files\")" diff --git a/azureml/medimageparse/mip-deploy.ipynb b/azureml/medimageparse/mip-deploy.ipynb index 812f3d0..039f24e 100755 --- a/azureml/medimageparse/mip-deploy.ipynb +++ b/azureml/medimageparse/mip-deploy.ipynb @@ -37,7 +37,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -69,7 +69,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -103,7 +103,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -130,7 +130,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -140,7 +140,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -161,7 +161,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -189,17 +189,13 @@ "metadata": {}, "source": [ "## 4 Test the endpoint - base64 encoded image and text\n", - "We will use one digital pathology image to test the endpoint it will be located in the 'images' directory.\n", "\n", - "**Download the sample data**: \n", - " - Use the following command to download the dataset with samples into your working folder. Once you download, make sure the files are in the `./images` directory located in the same directory as this notebook so that all paths in this sample work out of the box. \n", - "\n", - " `azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/medimageparse-images/* ./images`" + "Please follow the data download instructions in the main [README](../../README.md) to download the sample data for this notebook." ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -210,13 +206,14 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import base64\n", "import matplotlib.pyplot as plt\n", + "import os\n", "\n", "\n", "def read_image(image_path):\n", @@ -224,7 +221,9 @@ " return f.read()\n", "\n", "\n", - "sample_image = \"./images/pathology_breast.png\"\n", + "data_root = \"/home/azureuser/data/healthcare-ai\" # Change to the location you downloaded the data\n", + "sample_image = os.path.join(data_root, \"medimageparse-images\", \"pathology_breast.png\")\n", + "\n", "data = {\n", " \"input_data\": {\n", " \"columns\": [\"image\", \"text\"],\n", @@ -290,7 +289,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, "outputs": [ { diff --git a/deploy/existing/infra/main.bicep b/deploy/existing/infra/main.bicep index 81d9de6..9d6523c 100644 --- a/deploy/existing/infra/main.bicep +++ b/deploy/existing/infra/main.bicep @@ -118,6 +118,9 @@ output HLS_MODEL_ENDPOINTS array = modelDeploy.outputs.endpoints output UNIQUE_SUFFIX string = effectiveUniqueSuffix // GPT deployment outputs (conditional) -output AZURE_OPENAI_ENDPOINT string = !empty(gptModel) ? gptServices.outputs.gptEndpoint : '' -output AZURE_OPENAI_MODEL_NAME string = !empty(gptModel) ? gptServices.outputs.gptModelName : '' -output AZURE_AI_SERVICES_NAME string = !empty(gptModel) ? gptServices.outputs.aiServicesName : '' +output AZURE_OPENAI_ENDPOINT string = !empty(gptModel) ? gptServices.?outputs.gptEndpoint ?? '' : '' +output AZURE_OPENAI_INFERENCE_URI string = !empty(gptModel) ? gptServices.?outputs.gptInferenceUri ?? '' : '' +output AZURE_OPENAI_DEPLOYMENT_NAME string = !empty(gptModel) ? gptServices.?outputs.gptDeploymentName ?? '' : '' +output AZURE_OPENAI_MODEL_NAME string = !empty(gptModel) ? gptServices.?outputs.gptModelName ?? '' : '' +output AZURE_OPENAI_MODEL_VERSION string = !empty(gptModel) ? gptServices.?outputs.gptModelVersion ?? '' : '' +output AZURE_AI_SERVICES_NAME string = !empty(gptModel) ? gptServices.?outputs.aiServicesName ?? '' : '' diff --git a/deploy/fresh/azure.yaml b/deploy/fresh/azure.yaml index cae0324..5a8a61d 100644 --- a/deploy/fresh/azure.yaml +++ b/deploy/fresh/azure.yaml @@ -12,12 +12,24 @@ infra: hooks: preprovision: - shell: sh - run: | - python ../shared/scripts/preprovision.py --yes - interactive: true + posix: + shell: sh + run: | + python ../shared/scripts/preprovision.py --yes + interactive: true + windows: + shell: pwsh + run: | + python ../shared/scripts/preprovision.py --yes + interactive: true postprovision: - shell: sh - run: | - python ../shared/scripts/postprovision.py --yes - interactive: true + posix: + shell: sh + run: | + python ../shared/scripts/postprovision.py --yes + interactive: true + windows: + shell: pwsh + run: | + python ../shared/scripts/postprovision.py --yes + interactive: true \ No newline at end of file diff --git a/deploy/fresh/infra/main.bicep b/deploy/fresh/infra/main.bicep index f929d17..b9884fb 100644 --- a/deploy/fresh/infra/main.bicep +++ b/deploy/fresh/infra/main.bicep @@ -73,10 +73,10 @@ param allowSharedKeyAccess bool = false // ============================================================================ // VARIABLES - Configuration and Naming // ============================================================================ -var effectiveUniqueSuffix = empty(uniqueSuffix) ? substring(uniqueString(resourceGroup().id), 0, 6) : uniqueSuffix +var effectiveUniqueSuffix = empty(uniqueSuffix) ? substring(uniqueString(subscription().subscriptionId, resourceGroup().id), 0, 6) : uniqueSuffix var effectiveGptLocation = empty(gptDeploymentLocation) ? location : gptDeploymentLocation -var environmentNameTrunc = substring(((replace(replace(environmentName, '-', ''), '_', ''))),0,10) +var environmentNameTrunc = substring(((replace(replace(environmentName, '-', ''), '_', ''))),0,6) @@ -173,6 +173,9 @@ output UNIQUE_SUFFIX string = effectiveUniqueSuffix output HLS_MODEL_ENDPOINTS array = modelDeploy.outputs.endpoints // GPT deployment outputs (conditional) -output AZURE_OPENAI_ENDPOINT string = !empty(gptModel) ? gptServices.outputs.gptEndpoint : '' -output AZURE_OPENAI_MODEL_NAME string = !empty(gptModel) ? gptServices.outputs.gptModelName : '' -output AZURE_AI_SERVICES_NAME string = !empty(gptModel) ? gptServices.outputs.aiServicesName : '' +output AZURE_OPENAI_ENDPOINT string = !empty(gptModel) ? gptServices.?outputs.gptEndpoint ?? '' : '' +output AZURE_OPENAI_INFERENCE_URI string = !empty(gptModel) ? gptServices.?outputs.gptInferenceUri ?? '' : '' +output AZURE_OPENAI_DEPLOYMENT_NAME string = !empty(gptModel) ? gptServices.?outputs.gptDeploymentName ?? '' : '' +output AZURE_OPENAI_MODEL_NAME string = !empty(gptModel) ? gptServices.?outputs.gptModelName ?? '' : '' +output AZURE_OPENAI_MODEL_VERSION string = !empty(gptModel) ? gptServices.?outputs.gptModelVersion ?? '' : '' +output AZURE_AI_SERVICES_NAME string = !empty(gptModel) ? gptServices.?outputs.aiServicesName ?? '' : '' diff --git a/deploy/shared/aiServicesWithGpt.bicep b/deploy/shared/aiServicesWithGpt.bicep index bc39dde..ae0c60a 100644 --- a/deploy/shared/aiServicesWithGpt.bicep +++ b/deploy/shared/aiServicesWithGpt.bicep @@ -58,3 +58,4 @@ output gptEndpoint string = gptDeployment.outputs.endpoint output gptDeploymentName string = gptDeployment.outputs.deploymentName output gptModelName string = gptDeployment.outputs.modelName output gptModelVersion string = gptDeployment.outputs.modelVersion +output gptInferenceUri string = gptDeployment.outputs.inferenceUri diff --git a/deploy/shared/deployModel.bicep b/deploy/shared/deployModel.bicep index fb87375..a54ca90 100644 --- a/deploy/shared/deployModel.bicep +++ b/deploy/shared/deployModel.bicep @@ -25,8 +25,13 @@ param uniqueSuffix string = '' // Variables - Model loading, filtering and unique suffix calculation // ----------------------------------------------------------------------------- -// Load models from YAML -var models = loadYamlContent('models.yaml') +// Load models - from models.json if not empty, otherwise from modelsDefault.json +var modelsFromFileText = loadTextContent('models.json') +var modelsFromFileTextTrimmed = trim(modelsFromFileText) +var modelsFromFileTextTrimmedSafe = empty(modelsFromFileTextTrimmed) ? '[]': modelsFromFileTextTrimmed +var modelsFromFile = empty(modelsFromFileTextTrimmedSafe) ? [] : json(modelsFromFileTextTrimmedSafe) +var modelsDefault = loadJsonContent('modelsDefault.json') +var models = empty(modelsFromFile) ? modelsDefault : modelsFromFile // Calculate effective unique suffix var effectiveUniqueSuffix = empty(uniqueSuffix) ? substring(uniqueString(resourceGroup().id), 0, 6) : uniqueSuffix diff --git a/deploy/shared/gptDeployment.bicep b/deploy/shared/gptDeployment.bicep index 31d258e..130741d 100644 --- a/deploy/shared/gptDeployment.bicep +++ b/deploy/shared/gptDeployment.bicep @@ -52,3 +52,5 @@ output deploymentName string = !empty(gptModel) ? gptDeployment.name : '' output modelName string = !empty(gptModel) ? modelName : '' output modelVersion string = !empty(gptModel) ? modelVersion : '' output endpoint string = aiServices.properties.endpoint +var endpointWithSlash = endsWith(aiServices.properties.endpoint, '/') ? aiServices.properties.endpoint : '${aiServices.properties.endpoint}/' +output inferenceUri string = !empty(gptModel) ? '${endpointWithSlash}openai/deployments/${gptDeployment.name}/' : '' diff --git a/deploy/shared/models.json b/deploy/shared/models.json new file mode 100644 index 0000000..0637a08 --- /dev/null +++ b/deploy/shared/models.json @@ -0,0 +1 @@ +[] \ No newline at end of file diff --git a/deploy/shared/models.yaml b/deploy/shared/models.yaml deleted file mode 100644 index ff752cf..0000000 --- a/deploy/shared/models.yaml +++ /dev/null @@ -1,44 +0,0 @@ -- name: MedImageInsight - env_name: MI2_MODEL_ENDPOINT - deployment: - modelId: "azureml://registries/azureml/models/MedImageInsight/versions/10" - instanceType: Standard_NC4as_T4_v3 - instanceCount: 2 - requestSettings: - maxConcurrentRequestsPerInstance: 3 - requestTimeout: PT1M30S - livenessProbe: - initialDelay: PT10M -- name: MedImageParse - env_name: MIP_MODEL_ENDPOINT - deployment: - modelId: "azureml://registries/azureml/models/MedImageParse/versions/10" - instanceType: Standard_NC40ads_H100_v5 - instanceCount: 1 - requestSettings: - maxConcurrentRequestsPerInstance: 8 - requestTimeout: PT1M30S - livenessProbe: - initialDelay: PT10M -- name: CXRReportGen - env_name: CXRREPORTGEN_MODEL_ENDPOINT - deployment: - modelId: "azureml://registries/azureml/models/CxrReportGen/versions/6" - instanceType: Standard_NC40ads_H100_v5 - instanceCount: 1 - requestSettings: - maxConcurrentRequestsPerInstance: 1 - requestTimeout: PT1M30S - livenessProbe: - initialDelay: PT20M -- name: Prov-GigaPath - env_name: GIGAPATH_MODEL_ENDPOINT - deployment: - modelId: "azureml://registries/azureml/models/Prov-GigaPath/versions/2" - instanceType: Standard_NC6s_v3 - instanceCount: 1 - requestSettings: - maxConcurrentRequestsPerInstance: 1 - requestTimeout: PT1M30S - livenessProbe: - initialDelay: PT10M \ No newline at end of file diff --git a/deploy/shared/modelsDefault.json b/deploy/shared/modelsDefault.json new file mode 100644 index 0000000..30c96dd --- /dev/null +++ b/deploy/shared/modelsDefault.json @@ -0,0 +1,66 @@ +[ + { + "name": "MedImageInsight", + "env_name": "MI2_MODEL_ENDPOINT", + "deployment": { + "modelId": "azureml://registries/azureml/models/MedImageInsight/versions/10", + "instanceType": "Standard_NC4as_T4_v3", + "instanceCount": 2, + "requestSettings": { + "maxConcurrentRequestsPerInstance": 3, + "requestTimeout": "PT1M30S" + }, + "livenessProbe": { + "initialDelay": "PT10M" + } + } + }, + { + "name": "MedImageParse", + "env_name": "MIP_MODEL_ENDPOINT", + "deployment": { + "modelId": "azureml://registries/azureml/models/MedImageParse/versions/10", + "instanceType": "Standard_NC40ads_H100_v5", + "instanceCount": 1, + "requestSettings": { + "maxConcurrentRequestsPerInstance": 8, + "requestTimeout": "PT1M30S" + }, + "livenessProbe": { + "initialDelay": "PT10M" + } + } + }, + { + "name": "CXRReportGen", + "env_name": "CXRREPORTGEN_MODEL_ENDPOINT", + "deployment": { + "modelId": "azureml://registries/azureml/models/CxrReportGen/versions/6", + "instanceType": "Standard_NC40ads_H100_v5", + "instanceCount": 1, + "requestSettings": { + "maxConcurrentRequestsPerInstance": 1, + "requestTimeout": "PT1M30S" + }, + "livenessProbe": { + "initialDelay": "PT20M" + } + } + }, + { + "name": "Prov-GigaPath", + "env_name": "GIGAPATH_MODEL_ENDPOINT", + "deployment": { + "modelId": "azureml://registries/azureml/models/Prov-GigaPath/versions/2", + "instanceType": "Standard_NC6s_v3", + "instanceCount": 1, + "requestSettings": { + "maxConcurrentRequestsPerInstance": 1, + "requestTimeout": "PT1M30S" + }, + "livenessProbe": { + "initialDelay": "PT10M" + } + } + } +] \ No newline at end of file diff --git a/deploy/shared/scripts/postprovision.py b/deploy/shared/scripts/postprovision.py index a07e215..505b3da 100644 --- a/deploy/shared/scripts/postprovision.py +++ b/deploy/shared/scripts/postprovision.py @@ -1,9 +1,5 @@ #!/usr/bin/env python3 -import os import sys -import json -import re -import subprocess from pathlib import Path import traceback from utils import ( @@ -63,7 +59,10 @@ def gather_env_values(env_vars): openai_endpoint = env_vars.get("AZURE_OPENAI_ENDPOINT") if openai_endpoint: new_values["AZURE_OPENAI_ENDPOINT"] = openai_endpoint - new_values["AZURE_OPENAI_MODEL_NAME"] = env_vars.get("AZURE_OPENAI_MODEL_NAME") + deployment_name = env_vars.get("AZURE_OPENAI_DEPLOYMENT_NAME", "") + if deployment_name: + new_values["AZURE_OPENAI_DEPLOYMENT_NAME"] = deployment_name + print(f"Found OpenAI endpoint: {openai_endpoint}") # Get AI Services name directly from deployment outputs diff --git a/deploy/shared/scripts/select_models.py b/deploy/shared/scripts/select_models.py index 7cca107..39e4c15 100644 --- a/deploy/shared/scripts/select_models.py +++ b/deploy/shared/scripts/select_models.py @@ -1,7 +1,5 @@ #!/usr/bin/env python3 -import yaml import sys -import os import json import traceback from utils import ensure_azd_env, set_azd_env_value, load_models @@ -27,7 +25,7 @@ def main(): # Load model definitions models = load_models() if not models: - raise ValueError("No models found in models.yaml.") + raise ValueError("No models found in models configuration.") # Build and print available models in one loop available_models = [] @@ -42,7 +40,7 @@ def main(): available_models.append((name, instance_type, instance_count)) print(f" {len(available_models)}: {name}: {instance_type} x {instance_count}") if not available_models: - raise ValueError("No valid models found in models.yaml.") + raise ValueError("No valid models found in models configuration.") print() print( "Enter a comma-separated list of model numbers to deploy (e.g. 1,3,4), or '*' to deploy all:" diff --git a/deploy/shared/scripts/utils.py b/deploy/shared/scripts/utils.py index d714580..bd6d5f4 100644 --- a/deploy/shared/scripts/utils.py +++ b/deploy/shared/scripts/utils.py @@ -2,12 +2,9 @@ import json from pathlib import Path import re -import yaml -import os -from typing import Dict, List -from azureml.core import Workspace -MODELS_YAML = Path(__file__).parent.parent / "models.yaml" +MODELS_USER_JSON = Path(__file__).parent.parent / "models.json" +MODELS_DEFAULT_JSON = Path(__file__).parent.parent / "modelsDefault.json" REPO_ROOT = Path(__file__).parents[3] REPO_ENV_FILE = REPO_ROOT / ".env" @@ -24,6 +21,21 @@ BOLD = "\033[1m" END = "\033[0m" +CMD_ECHO_ENABLED = True + + +def run_shell(cmd, capture_output=True, text=True, check=False, echo=False, shell=True): + """ + Run a shell command using subprocess. + """ + if echo or CMD_ECHO_ENABLED: + cmd_str = " ".join(cmd) if isinstance(cmd, list) else cmd + print(f"{CYAN}Running: {cmd_str}{END}") + + return subprocess.run( + cmd, capture_output=capture_output, text=text, check=check, shell=shell + ) + def get_model_filter(): val = get_azd_env_value(MODEL_FILTER_ENV_VAR) @@ -33,16 +45,14 @@ def get_model_filter(): def get_azd_env_value(key, default=None): - result = subprocess.run( - ["azd", "env", "get-value", key], capture_output=True, text=True - ) + result = run_shell(["azd", "env", "get-value", key]) if result.returncode != 0 or not result.stdout.strip(): return default return result.stdout.strip().strip('"') def set_azd_env_value(key, value): - result = subprocess.run(["azd", "env", "set", key, value]) + result = run_shell(["azd", "env", "set", key, value]) return result.returncode == 0 @@ -51,12 +61,7 @@ def load_azd_env_vars(): Load all AZD environment variables by invoking `azd env get-values`. """ # `azd env get-values` outputs JSON of all key/value pairs - result = subprocess.run( - ["azd", "env", "get-values", "--output", "json"], - capture_output=True, - text=True, - check=True, - ) + result = run_shell(["azd", "env", "get-values", "--output", "json"], check=True) return json.loads(result.stdout) @@ -85,41 +90,76 @@ def ensure_azd_env(): def load_models(): - """Load models from YAML, returning a list of model dicts.""" - path = Path(MODELS_YAML) - if not path.exists(): - raise FileNotFoundError(f"models.yaml not found at {path}") - data = yaml.safe_load(path.read_text()) + """Load models from JSON, returning a list of model dicts.""" + + # Try to load models.json (user override file) + models_path = Path(MODELS_USER_JSON) + if not models_path.exists(): + raise FileNotFoundError(f"models.json not found at {models_path}") + + models_text = models_path.read_text().strip() + + # If models.json is empty or just empty array, use defaults + if not models_text or models_text == "[]": + default_path = Path(MODELS_DEFAULT_JSON) + if not default_path.exists(): + raise FileNotFoundError(f"modelsDefault.json not found at {default_path}") + models_text = default_path.read_text().strip() + + data = json.loads(models_text) + + # Validate structure if isinstance(data, dict): for v in data.values(): if isinstance(v, list): return v - raise ValueError("No model list found in YAML file.") + raise ValueError("No model list found in JSON.") if isinstance(data, list): return data - raise ValueError("models.yaml is not a list or dict of lists.") + raise ValueError("models JSON improperly formatted.") def get_ml_workspace(name: str, resource_group: str, subscription: str) -> dict: """ - Returns the Azure ML workspace object using the Python SDK, or raises RuntimeError if not found. + Returns the Azure ML workspace object using Azure CLI, or raises RuntimeError if not found. """ try: - ws = Workspace.get( - name=name, resource_group=resource_group, subscription_id=subscription + cmd = [ + "az", + "ml", + "workspace", + "show", + "--name", + name, + "--resource-group", + resource_group, + "--subscription", + subscription, + "--output", + "json", + ] + + result = run_shell(cmd, check=True) + ws_data = json.loads(result.stdout) + + return { + "location": ws_data.get("location"), + "resourceGroup": ws_data.get("resource_group"), + "id": ws_data.get("id"), + "name": ws_data.get("name"), + } + + except subprocess.CalledProcessError as e: + error_msg = e.stderr.strip() if e.stderr else str(e) + raise RuntimeError( + f"Failed to retrieve workspace '{name}' in RG '{resource_group}': {error_msg}" ) + except json.JSONDecodeError as e: + raise RuntimeError(f"Failed to parse Azure CLI response: {e}") except Exception as e: raise RuntimeError( f"Failed to retrieve workspace '{name}' in RG '{resource_group}': {e}" ) - # Construct the ARM resource ID since Workspace object doesn't expose .id - arm_id = f"/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{name}" - return { - "location": ws.location, - "resourceGroup": ws.resource_group, - "id": arm_id, - "name": ws.name, - } def get_openai_api_key(ai_services_name: str, resource_group: str) -> str: @@ -153,7 +193,7 @@ def get_openai_api_key(ai_services_name: str, resource_group: str) -> str: "tsv", ] - result = subprocess.run(cmd, capture_output=True, text=True, check=True) + result = run_shell(cmd, check=True) api_key = result.stdout.strip() if not api_key: diff --git a/docs/deployment-guide.md b/docs/deployment-guide.md index a94d254..c93fa3d 100644 --- a/docs/deployment-guide.md +++ b/docs/deployment-guide.md @@ -45,6 +45,13 @@ AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_API_KEY= ``` +> [!NOTE] +> **Azure OpenAI Endpoint Flexibility**: The `AZURE_OPENAI_ENDPOINT` supports two configuration formats: +> 1. **Full inference URI** (set automatically by deployment): Includes deployment name and API version in the URL +> 2. **Base endpoint URL**: Use with separate `AZURE_OPENAI_DEPLOYMENT_NAME` environment variable for manual configuration +> +> The toolkit automatically detects which format you're using. See `env.example` for detailed examples of both formats. + ## Deployment Configuration Choose the deployment method that best fits your environment and requirements: @@ -67,6 +74,75 @@ python ../shared/scripts/select_models.py azd env set HLS_MODEL_FILTER "medimageinsight,cxrreportgen" ``` +### Customizing Model Deployments + +The deployment system uses two configuration files in the `deploy/shared/` directory: + +- **`modelsDefault.json`** - Contains default configurations for all healthcare AI models (do not modify) +- **`models.json`** - User override file (empty by default) + +By default, `models.json` is empty (`[]`), which means all models from `modelsDefault.json` will be deployed (subject to any `HLS_MODEL_FILTER` settings). You can customize model deployments by editing `models.json`. + +#### Manual Editing + +Edit `deploy/shared/models.json` directly and add your custom model configurations: + +```json +[ + { + "name": "MedImageInsight", + "env_name": "MI2_MODEL_ENDPOINT", + "deployment": { + "modelId": "azureml://registries/azureml/models/MedImageInsight/versions/10", + "instanceType": "Standard_NC4as_T4_v3", + "instanceCount": 3, + "requestSettings": { + "maxConcurrentRequestsPerInstance": 5, + "requestTimeout": "PT2M" + } + } + } +] +``` + +#### Automated Deployment Examples + +If you're automating deployments, you can programmatically populate `models.json`: + +**Bash:** +```bash +# Copy default configuration +cat deploy/shared/modelsDefault.json > deploy/shared/models.json + +# Or create custom configuration +echo '[{"name":"MedImageInsight","env_name":"MI2_MODEL_ENDPOINT","deployment":{"modelId":"azureml://registries/azureml/models/MedImageInsight/versions/10","instanceType":"Standard_NC4as_T4_v3","instanceCount":3}}]' > deploy/shared/models.json +``` + +**PowerShell:** +```powershell +# Copy default configuration +Get-Content deploy/shared/modelsDefault.json | Set-Content deploy/shared/models.json + +# Or create custom configuration +'[{"name":"MedImageInsight","env_name":"MI2_MODEL_ENDPOINT","deployment":{"modelId":"azureml://registries/azureml/models/MedImageInsight/versions/10","instanceType":"Standard_NC4as_T4_v3","instanceCount":3}}]' | Set-Content deploy/shared/models.json +``` + +#### Configuration Options + +Each model entry supports the following properties: + +- **`name`** - Display name for the model +- **`env_name`** - Environment variable name for the endpoint +- **`deployment.modelId`** - Azure ML model registry path and version +- **`deployment.instanceType`** - VM SKU for compute (e.g., `Standard_NC4as_T4_v3`) +- **`deployment.instanceCount`** - Number of instances (affects cost and availability) +- **`deployment.requestSettings.maxConcurrentRequestsPerInstance`** - Concurrent requests per instance +- **`deployment.requestSettings.requestTimeout`** - Timeout in ISO 8601 duration format (e.g., `PT1M30S`) +- **`deployment.livenessProbe.initialDelay`** - Startup delay in ISO 8601 duration format (e.g., `PT10M`) + +> [!TIP] +> To reset to defaults, simply empty the `models.json` file by setting its content to `[]`. + ### GPT Model Configuration #### GPT Model Options diff --git a/docs/manual-deployment.md b/docs/manual-deployment.md index 6f86aac..94d8e25 100644 --- a/docs/manual-deployment.md +++ b/docs/manual-deployment.md @@ -94,7 +94,37 @@ MIP_MODEL_ENDPOINT=/subscriptions/{your-sub-id}/resourceGroups/{your-rg}/provide CXRREPORTGEN_MODEL_ENDPOINT=/subscriptions/{your-sub-id}/resourceGroups/{your-rg}/providers/Microsoft.MachineLearningServices/workspaces/{your-workspace}/onlineEndpoints/{your-cxrreportgen-endpoint} ``` -**Note**: Use the full resource ID path (with the leading slash) as shown above. Replace the placeholder values in curly braces with your actual resource names and IDs. See `env.example` for more examples and detailed formatting instructions. +### Optional: Azure OpenAI Configuration + +If you're using GPT models with the examples, configure Azure OpenAI access. You have two options: + +**Option 1: Base Endpoint (Recommended for Manual Setup)** + +```bash +AZURE_OPENAI_ENDPOINT=https://{your-service}.cognitiveservices.azure.com/ +AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o +AZURE_OPENAI_API_KEY= +``` + +This format is flexible and allows you to easily switch between different model deployments by changing only the `AZURE_OPENAI_DEPLOYMENT_NAME` variable. + +**Option 2: Full Inference URI** + +```bash +AZURE_OPENAI_ENDPOINT=https://{your-service}.cognitiveservices.azure.com/openai/deployments/{deployment-name}/chat/completions?api-version=2025-01-01-preview +AZURE_OPENAI_API_KEY= +``` + +This format embeds the deployment name and API version directly in the URL. It's used by the automatic deployment but requires changing the entire URL to switch deployments. + +The Healthcare AI Toolkit automatically detects which format you're using. + +**Note**: +- Replace `{your-service}` with your Azure OpenAI service name +- Replace `{deployment-name}` with your GPT deployment name (e.g., `gpt-4o`) +- Replace `` with your Azure OpenAI API key +- Use the full resource ID path (with the leading slash) for healthcare AI model endpoints as shown above +- See `env.example` for additional examples and detailed formatting instructions ## Next Steps diff --git a/env.example b/env.example index 5168b78..d1f7c79 100644 --- a/env.example +++ b/env.example @@ -2,33 +2,37 @@ # or if running this code from a different workspace. # Example format: "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace-name}/onlineEndpoints/{endpoint-name}" # If the endpoint is in this workspace, you can simply specify its name, e.g., "medimageinsight-xyz" -MI2_MODEL_ENDPOINT = "" +MI2_MODEL_ENDPOINT = "" # Specify an endpoint ID if not using an AzureML-enabled notebook # or if running this code from a different workspace. # Example format: "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace-name}/onlineEndpoints/{endpoint-name}" # If the endpoint is in this workspace, you can simply specify its name, e.g., "medimageparse-xyz" -MIP_MODEL_ENDPOINT = "" +MIP_MODEL_ENDPOINT = "" # Specify an endpoint ID if not using an AzureML-enabled notebook # or if running this code from a different workspace. # Example format: "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace-name}/onlineEndpoints/{endpoint-name}" # If the endpoint is in this workspace, you can simply specify its name, e.g., "gigapath-xyz" -GIGAPATH_MODEL_ENDPOINT = "" +GIGAPATH_MODEL_ENDPOINT = "" # Specify an endpoint ID if not using an AzureML-enabled notebook # or if running this code from a different workspace. # Example format: "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace-name}/onlineEndpoints/{endpoint-name}" # If the endpoint is in this workspace, you can simply specify its name, e.g., "cxrreportgen-xyz" -CXRREPORTGEN_MODEL_ENDPOINT = "" +CXRREPORTGEN_MODEL_ENDPOINT = "" -# Root directory for data storage +# Root directory for data storage (use absolute path) DATA_ROOT = /home/azureuser/data/healthcare-ai/ -## A directory with many files to test parallel processing -PARALLEL_TEST_DATA_ROOT = /home/azureuser/data/dicoms/ - -AZURE_OPENAI_ENDPOINT = "" -AZURE_OPENAI_API_KEY = "" \ No newline at end of file +# Azure OpenAI / GPT Configuration (Optional) +# You can specify either: +# 1. Full inference URI (includes deployment and API version): +# AZURE_OPENAI_ENDPOINT = "https://{your-service}.cognitiveservices.azure.com/openai/deployments/{deployment-name}/chat/completions?api-version={api_version}" +# 2. Base endpoint with separate model configuration: +# AZURE_OPENAI_ENDPOINT = "https://{your-service}.cognitiveservices.azure.com/" +# AZURE_OPENAI_DEPLOYMENT_NAME = "gpt-4o" +# Required: API key for authentication +AZURE_OPENAI_ENDPOINT = "" \ No newline at end of file diff --git a/package/healthcareai_toolkit/cli/test_endpoints.py b/package/healthcareai_toolkit/cli/test_endpoints.py index a4c80cd..2cbf5b8 100644 --- a/package/healthcareai_toolkit/cli/test_endpoints.py +++ b/package/healthcareai_toolkit/cli/test_endpoints.py @@ -106,21 +106,13 @@ def test_medimageinsight_endpoint(quiet: bool = False) -> Optional[bool]: ) if not os.path.exists(input_folder): - print(f"{Colors.YELLOW}⚠ Test data not found at {input_folder}{Colors.END}") - print( - f"{Colors.GREEN}✓ Skipping functional test (no test data){Colors.END}" - ) - return True + print(f"{Colors.RED}✗ Test data not found at {input_folder}{Colors.END}") + return False image_files = list(glob.glob(input_folder + "/*.dcm")) if not image_files: - print( - f"{Colors.YELLOW}⚠ No DICOM files found in {input_folder}{Colors.END}" - ) - print( - f"{Colors.GREEN}✓ Skipping functional test (no test data){Colors.END}" - ) - return True + print(f"{Colors.RED}✗ No DICOM files found in {input_folder}{Colors.END}") + return False test_image = image_files[0] print( @@ -172,11 +164,8 @@ def test_medimageparse_endpoint(quiet: bool = False) -> bool: test_image = os.path.join(input_folder, "covid_1585.png") if not os.path.exists(test_image): - print(f"{Colors.YELLOW}⚠ Test data not found at {test_image}{Colors.END}") - print( - f"{Colors.GREEN}✓ Skipping functional test (no test data){Colors.END}" - ) - return True + print(f"{Colors.RED}✗ Test data not found at {test_image}{Colors.END}") + return False print( f"{Colors.GREEN}✓ Found test image: {os.path.basename(test_image)}{Colors.END}" @@ -231,11 +220,8 @@ def test_cxrreportgen_endpoint(quiet: bool = False) -> Optional[bool]: lateral = os.path.join(input_folder, "cxr_lateral.jpg") if not (os.path.exists(frontal) and os.path.exists(lateral)): - print(f"{Colors.YELLOW}⚠ Test data not found at {input_folder}{Colors.END}") - print( - f"{Colors.GREEN}✓ Skipping functional test (no test data){Colors.END}" - ) - return True + print(f"{Colors.RED}✗ Test data not found at {input_folder}{Colors.END}") + return False print( f"{Colors.GREEN}✓ Found test images: {os.path.basename(frontal)}, {os.path.basename(lateral)}{Colors.END}" @@ -298,11 +284,8 @@ def test_gigapath_endpoint(quiet: bool = False) -> Optional[bool]: test_image = os.path.join(input_folder, "TCGA-19-2631.png") if not os.path.exists(test_image): - print(f"{Colors.YELLOW}⚠ Test data not found at {test_image}{Colors.END}") - print( - f"{Colors.GREEN}✓ Skipping functional test (no test data){Colors.END}" - ) - return True + print(f"{Colors.RED}✗ Test data not found at {test_image}{Colors.END}") + return False print( f"{Colors.GREEN}✓ Found test image: {os.path.basename(test_image)}{Colors.END}" @@ -342,8 +325,10 @@ def test_gpt_endpoint(quiet: bool = False) -> Optional[bool]: print(f"{Colors.RED}⚠ AZURE_OPENAI_API_KEY not configured!{Colors.END}") return False - if not settings.AZURE_OPENAI_MODEL_NAME: - print(f"{Colors.RED}⚠ AZURE_OPENAI_MODEL_NAME not configured!{Colors.END}") + if not settings.AZURE_OPENAI_DEPLOYMENT_NAME: + print( + f"{Colors.RED}⚠ AZURE_OPENAI_DEPLOYMENT_NAME not configured!{Colors.END}" + ) return False print(f"{Colors.GREEN}✓ Creating OpenAI client...{Colors.END}") @@ -354,7 +339,7 @@ def test_gpt_endpoint(quiet: bool = False) -> Optional[bool]: # Try a simple completion request response = client.chat.completions.create( - model=settings.AZURE_OPENAI_MODEL_NAME, # This should be the deployed model name + model=settings.AZURE_OPENAI_DEPLOYMENT_NAME, messages=[ { "role": "user", @@ -411,7 +396,7 @@ def print_configuration(): if settings.AZURE_OPENAI_ENDPOINT: print( - f" AZURE_OPENAI_MODEL_NAME:\n {settings.AZURE_OPENAI_MODEL_NAME or f'{Colors.YELLOW}(not set){Colors.END}'}" + f" AZURE_OPENAI_DEPLOYMENT_NAME:\n {settings.AZURE_OPENAI_DEPLOYMENT_NAME or f'{Colors.YELLOW}(not set){Colors.END}'}" ) if settings.AZURE_OPENAI_API_KEY: # Mask the API key for security diff --git a/package/healthcareai_toolkit/clients/openai.py b/package/healthcareai_toolkit/clients/openai.py index fe43c8f..d0cb79a 100644 --- a/package/healthcareai_toolkit/clients/openai.py +++ b/package/healthcareai_toolkit/clients/openai.py @@ -6,13 +6,10 @@ def create_openai_client(): - """Plumbing to create the OpenAI client""" - endpoint = settings.AZURE_OPENAI_ENDPOINT - api_key = settings.AZURE_OPENAI_API_KEY - + """Create Azure OpenAI client with configuration from settings.""" client = AzureOpenAI( - azure_endpoint=endpoint, - api_key=api_key, - api_version="2024-02-01", + azure_endpoint=settings.AZURE_OPENAI_ENDPOINT, + api_key=settings.AZURE_OPENAI_API_KEY, + api_version=settings.AZURE_OPENAI_API_VERSION, ) return client diff --git a/package/healthcareai_toolkit/settings.py b/package/healthcareai_toolkit/settings.py index a405905..9101331 100644 --- a/package/healthcareai_toolkit/settings.py +++ b/package/healthcareai_toolkit/settings.py @@ -1,5 +1,6 @@ import os import types +import re from dotenv import load_dotenv @@ -14,9 +15,57 @@ PARALLEL_TEST_DATA_ROOT = os.environ.get("PARALLEL_TEST_DATA_ROOT", None) -AZURE_OPENAI_ENDPOINT = os.environ.get("AZURE_OPENAI_ENDPOINT", None) + +def _get_azure_openai_config(): + """ + Get Azure OpenAI configuration from environment variables. + """ + endpoint_url = os.environ.get("AZURE_OPENAI_ENDPOINT", None) + + if not endpoint_url: + return None, None, None + + # Validate that endpoint_url is a valid URL + if not endpoint_url.startswith(("http://", "https://")): + raise ValueError( + f"AZURE_OPENAI_ENDPOINT must be a valid URL starting with http:// or https://, " + f"got: {endpoint_url}" + ) + + # Try to parse as inference URI + # Format: https://{resource}.openai.azure.com/openai/deployments/{deployment}/chat/completions?api-version={version} + match = re.search( + r"(?Phttps://[^/]+)/openai/deployments/(?P[^/]+)/.*api-version=(?P[^&]+)", + endpoint_url, + ) + if match: + return ( + match.group("endpoint"), + match.group("deployment"), + match.group("api_version"), + ) + + # Base endpoint format - use separate environment variables + deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME", None) + api_version = os.environ.get("AZURE_OPENAI_API_VERSION", "2024-12-01-preview") + + # Raise error if base endpoint is set but missing deployment name + if not deployment_name: + raise ValueError( + "AZURE_OPENAI_ENDPOINT is set to a base endpoint, but AZURE_OPENAI_DEPLOYMENT_NAME " + "is required. Either provide both values or use a full inference URI format." + ) + + return endpoint_url, deployment_name, api_version + + +# Azure OpenAI Configuration AZURE_OPENAI_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY", None) -AZURE_OPENAI_MODEL_NAME = os.environ.get("AZURE_OPENAI_MODEL_NAME", None) +( + AZURE_OPENAI_ENDPOINT, + AZURE_OPENAI_DEPLOYMENT_NAME, + AZURE_OPENAI_API_VERSION, +) = _get_azure_openai_config() _constants = { diff --git a/package/pyproject.toml b/package/pyproject.toml index 99ddf23..adc4e3f 100644 --- a/package/pyproject.toml +++ b/package/pyproject.toml @@ -20,13 +20,13 @@ scikit-learn = "~1.5.0" torch = "~2.4.0" torchvision = "~0.19.0" tqdm = "~4.66.5" -python-gdcm = "~3.0.24.0" +python-gdcm = "~3.0.26" gdown = "~5.2.0" SimpleITK = "~2.4.0" opencv-python = "~4.10.0.84" pydicom = "~2.4.0" azure-identity = "~1.19.0" -timm = "~1.0.10" +timm = "~1.0.19" transformers = "~4.16.2" setuptools = "~59.8.0" einops = "~0.8.0" @@ -39,13 +39,14 @@ pandas = "~2.0.3" jupyter = "~1.1.1" pillow = "~10.4.0" matplotlib = "~3.7.5" -numpy = "~1.24.4" -openai = "~1.89.0" +numpy = "~1.25.2" +openai = "~1.98.0" umap-learn = "~0.5.6" -scipy = "~1.10.1" +scipy = "~1.15.3" azureml-core = "~1.57.0.post3" ratelimit = "~2.2.1" -python-magic = "~0.4.27" +python-magic-bin = {version = "~0.4.0", markers = "sys_platform == 'win32'"} +python-magic = {version = "~0.4.0", markers = "sys_platform != 'win32'"} scikit-image = "~0.24.0" python-dotenv = "~1.0.1" nibabel = "~5.3.1"