Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/package-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ jobs:
fail-fast: false
matrix:
include:
- python-version: "3.9"
- python-version: "3.10"
- python-version: "3.11"

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,4 @@ cython_debug/

sandbox/
.azure
data/
37 changes: 25 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ These examples take a closer look at certain solutions and patterns of usage for
* **[Image Search Series Pt 1: Searching for similar XRay images](./azureml/advanced_demos/image_search/2d_image_search.ipynb)** [MI2] - an opener in the series on image-based search. How do you use foundation models to build an efficient system to look up similar Xrays? Read [our blog](https://techcommunity.microsoft.com/blog/healthcareandlifesciencesblog/image-search-series-part-1-chest-x-ray-lookup-with-medimageinsight/4372736) for more details.
* **[Image Search Series Pt 2: 3D Image Search with MedImageInsight](./azureml/advanced_demos/image_search/3d_image_search.ipynb)** [MI2] - expanding on the image-based search topics we look at 3D images. How do you use foundation models to build a system to search the archive of CT scans for those with similar lesions in the pancreas? Read [our blog](https://aka.ms/3DImageSearch) for more details.

### 🤖 Agentic AI Examples

These examples demonstrate how to build intelligent conversational agents that integrate healthcare AI models with natural language understanding:

* **[Medical Image Classification Agent](./azureml/medimageinsight/agent-classification-example.ipynb)** [MI2, GPT] - build a conversational AI agent that classifies medical images through natural language interactions. Learn practical patterns for coordinating image data with LLM function calls, managing conversation state, and routing image analysis tasks to MedImageInsight embeddings.

## Getting Started

To get started with using our healthcare AI models and examples, follow the instructions below to set up your environment and run the sample applications.
Expand All @@ -67,8 +73,8 @@ To get started with using our healthcare AI models and examples, follow the inst
- **Optional**: Azure OpenAI access for GPT models (limited use in examples).
- **Tools**:
- **For running examples**:
- [AzCopy](https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy) for downloading sample data
- Python `>=3.9.0,<3.12` and pip `>=21.3` (for running locally)
- Python `>=3.10.0,<3.12` and pip `>=21.3` (for running locally)
- [Git LFS](https://git-lfs.github.com/) for cloning the data repository
- **For deploying models**:
- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
- [Azure Developer CLI](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd?tabs=winget-windows%2Cbrew-mac%2Cscript-linux&pivots=os-linux)
Expand Down Expand Up @@ -221,33 +227,38 @@ Now that you have deployed the models, you need to configure your local environm
After deployment, verify that your root level `.env` file contains the necessary environment variables for connecting to your deployed models. Each automatic deployment method will configure this file with the appropriate settings for your chosen approach.

> [!IMPORTANT]
> Check the value of `DATA_ROOT` in your `.env` file to ensure it's appropriate for your setup. The default value is `/home/azureuser/data/`, but you may need to modify it based on your environment. If you change the `DATA_ROOT` value, you'll also need to update the destination path in the azcopy command in the following step.
> Check the value of `DATA_ROOT` in your `.env` file to ensure it's appropriate for your setup. The default value is `/home/azureuser/data/healthcare-ai/`, but you may need to modify it based on your environment. **Use an absolute path** (not a relative path like `./data/`) to ensure consistent access across different working directories. If you change the `DATA_ROOT` value, you'll also need to update the destination path in the git clone command in the following step.
>
> **Azure OpenAI Configuration**: If you deployed GPT models, your `.env` file will contain `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_KEY`. The endpoint supports two formats:
> 1. **Full inference URI** (deployed automatically): `https://{your-service}.cognitiveservices.azure.com/openai/deployments/{deployment}/chat/completions?api-version={version}`.
> 2. **Base endpoint** (for manual configuration): `https://{your-service}.cognitiveservices.azure.com/` with separate `AZURE_OPENAI_DEPLOYMENT_NAME` variable.
>
> See `env.example`.

> [!NOTE]
> If you used a manual deployment method you will have to configure this file yourself, see [Manual Deployment](docs/manual-deployment.md) for more information.

#### Download Sample Data

The sample data used by the examples is located in our Blob Storage account. Use [azcopy tool](https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy) to download:
The sample data used by the examples is available in the [healthcareai-examples-data](https://github.com/microsoft/healthcareai-examples-data) GitHub repository.

> [!IMPORTANT]
> The data repository uses Git LFS (Large File Storage) for medical image files. Make sure you have [Git LFS](https://git-lfs.github.com/) installed before cloning. Without it, you'll only download placeholder files instead of the actual data.

Clone the repository to download the data:

```sh
azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ /home/azureuser/data/
git clone https://github.com/microsoft/healthcareai-examples-data.git /home/azureuser/data/healthcare-ai
```

> [!TIP]
> This downloads the entire dataset. For specific examples, you can download subsets by appending the subfolder name to the source URL.
> This downloads the entire dataset. If you prefer a different location, adjust the target path and update the `DATA_ROOT` value in your `.env` file accordingly. For more information about the data, see the [data repository README](https://github.com/microsoft/healthcareai-examples-data/blob/main/README.md).

#### Install Healthcare AI Toolkit

Install the helper toolkit that facilitates working with endpoints, DICOM files, and medical imaging:

```sh
# Standard installation
pip install ./package/
```
_or_
```sh
# Editable installation for development
pip install -e ./package/
```

Expand All @@ -271,6 +282,8 @@ Now you're ready to explore the notebooks! Start with one of these paths:

**📋 Report Generation**: See example usage in **[CXRReportGen deployment](./azureml/cxrreportgen/cxr-deploy.ipynb)**.

**🤖 Agentic AI**: Learn how to use models within an agentic framework with the **[medical image classification agent](./azureml/medimageinsight/agent-classification-example.ipynb)**.

**🚀 Advanced**: Explore **[image search](./azureml/advanced_demos/image_search/2d_image_search.ipynb)**, **[outlier detection](./azureml/medimageinsight/outlier-detection-demo.ipynb)**, or **[multimodal analysis](./azureml/advanced_demos/radpath/rad_path_survival_demo.ipynb)**.

## Project Structure
Expand Down
67 changes: 29 additions & 38 deletions azureml/advanced_demos/image_search/2d_image_search.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,36 +12,27 @@
"## Image Search Series Part 1: Chest X-ray Search with MedImageInsight (MI2)\n",
"In this tutorial, we show you how to build and optimize a 2D image search system for chest X-rays using **MedImageInsight embeddings**.\n",
"\n",
"### Dataset \n",
"We provide a sample dataset of 100 2D chest X-ray DICOM images, categorized into the following pathology classes: No Findings, Support Devices, Pleural Effusion, Cardiomegaly, and Atelectasis. Each image contains a single pathology class, but the methods demonstrated can be adapted for multi-label scenarios as well.\n",
"## Prerequisites\n",
"\n",
"Please download the data using the following command:\n",
"This notebook requires the following setup. If you haven't completed these steps, please refer to the Getting Started section in the main README, which includes:\n",
"\n",
"```sh\n",
"azcopy copy --recursive https://azuremlexampledata.blob.core.windows.net/data/healthcare-ai/ \n",
"/home/azureuser/data/\n",
"```\n",
"1. Deploying required models\n",
"2. Installing the Healthcare AI Toolkit\n",
"3. Downloading sample data\n",
"4. Configuring your `.env` file\n",
"\n",
"### Online Endpoint Deployment \n",
"The **MedImageInsight (MI2) Model** can be accessed and deployed via the [Azure AI Model Catalog](https://azure.microsoft.com/en-us/products/ai-model-catalog). Alternatively, you can deploy the model programmatically, as detailed in the [deployment notebook](https://aka.ms/healthcare-ai-examples-mi2-deploy).\n",
"### Required for This Notebook\n",
"\n",
"### Environment \n",
"1. Install the **healthcareai_toolkit** package from the root of the repository: \n",
"- **Model Endpoint(s)**: `MI2_MODEL_ENDPOINT` \n",
"- **Additional Dependencies**: \n",
" ```bash\n",
" conda install -c pytorch faiss-cpu\n",
" ```\n",
"\n",
" ```sh\n",
" pip install ./package\n",
" ```\n",
"2. Set up your `.env` file with the `DATA_ROOT` and `MI2_MODEL_ENDPOINT` parameters.\n",
"3. Install the **FAISS** library using: \n",
" ```sh\n",
" conda install -c pytorch faiss-cpu\n",
" ```\n",
"\n",
"### FAISS (Facebook AI Similarity Search) \n",
"[FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, even those too large to fit in memory. It supports adding or removing individual vectors and computes exact distances between them. FAISS is perfect for building scalable image search systems like the one in this tutorial.\n",
"> **Note**: [FAISS](https://github.com/facebookresearch/faiss) provides efficient algorithms for searching large sets of vectors, making it perfect for building scalable image search systems like the one in this tutorial.\n",
"\n",
"### 2D Image Search \n",
"This tutorial walks you through the use of an embedding model to create a vector index and then build a system that woulod look up similar images based on image provided. We will first use out-of-the-box capabilities of MedImageInsight model to build a basic system, and then will enhance performance by applying some of the concepts introduced in other notebooks from this repository. In a prior [adapter training notebook](https://aka.ms/healthcare-ai-examples-mi2-adapter), we demonstrated how to train an adapter for classification. Here, we will also train a simple adapter to refine the MI2 models embeddings to improve representation and then see how it improves performance. \n",
"This tutorial walks you through the use of an embedding model to create a vector index and then build a system that woulod look up similar images based on image provided. We will first use out-of-the-box capabilities of MedImageInsight model to build a basic system, and then will enhance performance by applying some of the concepts introduced in other notebooks from this repository. In a prior [adapter training notebook](https://aka.ms/healthcare-ai-examples-mi2-adapter), we demonstrated how to train an adapter for classification. Here, we will also train a simple adapter to refine the MI2 model's embeddings to improve representation and then see how it improves performance. \n",
"\n",
"In either approach we will be building an index using FAISS library. Note that the index will need to be rebuilt if we are using different representations (like with the adapter approach). Once the FAISS index is built, we query it with a new embedding (query vector) to retrieve the most similar images. FAISS supports both exact and approximate nearest neighbor searches, allowing for a balance between speed and precision. In this tutorial, we use nearest neighbor search to find the most relevant images based on the query.\n",
"\n",
Expand Down Expand Up @@ -80,7 +71,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -106,7 +97,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -135,7 +126,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -164,7 +155,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -208,7 +199,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -236,7 +227,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -265,7 +256,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -287,7 +278,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -466,7 +457,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -504,7 +495,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -528,7 +519,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -588,7 +579,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -620,7 +611,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -674,7 +665,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -847,7 +838,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down
Loading