Repository for the project "TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record"

Authors: Evangelos K. Oikonomou, Bruno Batinica, Lovedeep S. Dhingra, Arya Aminorroaya, Andreas Coppi, Rohan Khera

Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA. Cardiovascular Data Science (CarDS) Lab, Yale School of Medicine, New Haven, CT, USA.

This is a public code repository for the project "TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record" Parts of this code have been deidentified to prevent references to potentially identifiable or health system-specific information.

Model weights for the pre-trained ECG image vision transformer (ViT) can be accessed on: https://huggingface.co/CarDSLab/ecg-clip-beit-base-384.

Environment Setup

# first clone the repo and navigate to the local folder
cd target-ai-shared
conda env create -f ./environment_files/ecg_ehr_clip_env.yml # heavy environment with all packages
conda env create -f ./environment_files/ecg_image_vit_light.yml # light environment with only packages for inference

For Quick Inference on New Images

You can run quick inference on a folder of ECG images by accessing the model's weights on HuggingFace. In the command below, the following can be adjusted:

hf_repo: this points to the HF repo of the published model and weights
image_dir: point to a folder containing ECG images in the 4 standard layouts - as shown below
centroid_csv: point to the csv containing reference embeddings/centroids for cases/controls - we provide examples from our training set and EchoNext (Elias, P., & Finer, J. (2025). EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs (version 1.1.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/3ykd-bf14)
output_csv: path where the output will be stored

conda activate ecg_image_vit_light
python ./demo/zero_shot_from_reference_embedding.py \
  --hf_repo "CarDSLab/ecg-clip-beit-base-384" \
  --image_dir "./demo/online_ecg_images_credit_to_liftl" \
  --centroid_csv "./demo/reference_embeddings/ynhhs_reference_centroids.csv" \
  --output_csv "./demo/output/ynhhs_output.csv" \
  --batch_size 32 \
  --device cuda

Part I: Defining the Project Cohorts

The manuscript includes pairs of ECG-TTE studies performed within 90 days of each other. The inclusion and exclusion criteria are described in the accompanying manuscript. Briefly:

Development cohort: 2016 to 2021
Temporally distinct internal test set: 2022 to 2023

Part II: CLIP Pre-training of ECG Image ViT and TTE Text Encoder

conda activate ecg_ehr_clip_env
python /ecg_echo_clip_training/main_run_experiments.py

Each training run produces:

Tokenizer and Configuration Files:

added_tokens.json, config.json, merges.txt, preprocessor_config.json, special_tokens_map.json, tokenizer_config.json, tokenizer.json, vocab.json

Model Checkpoints:

best_model.pth, final_model.pth, epoch_10.pth, ...

Processor and Metrics:

processor_initial/, best_processor/, metrics.txt

Data Files:

train_data.csv, val_data.csv, frequent_words.txt

Output files are saved under hyperparameter-specific subdirectories in output_dir.

Part III: CLMBR-T Mapping of EHR

This section is based on JDAT extraction from the Yale EHR.

# Stepwise preprocessing
python /clmbrt-mapping/ynhhspreprocess_ynhhs_population.py
python /clmbrt-mapping/preprocess_ynhhs_labs.py
python /clmbrt-mapping/preprocess_ynhhs_meds.py
python /clmbrt-mapping/preprocess_ynhhs_pool_events.py
python /clmbrt-mapping/preprocess_split_parquets.py

# Parallel representation generation
/clmbrt-mapping/ynhhs/run_parallel.sh

# Merge outputs
python /clmbrt-mapping/combine_representation_chunks.py
python /clmbrt-mapping/combine_ecg_echo_clmbrt_embeddings.py

Final output: combined_ecg_echo_clmbr_embeddings_YYYYMMDD.parquet For UKB, see analogous scripts under /clmbrt-mapping/ukb

Part IV: Zero-shot Inference

This module performs zero-shot evaluation using image embeddings and centroids.

Step 1: Extract Centroids

python /ecg_echo_clip_training/extract_embeddings.py

Inputs: train_image_embeddings.parquet, train_labels.csv
Output: centroid_results.csv

Step 2: Evaluate on New Data

python ecg_echo_clip_training/zero_shot_from_reference_embeddings.py \
    --val_data_path /path/to/val.csv \
    --train_data_path /path/to/train.csv \
    --embeddings_path /path/to/clip_embeddings \
    --output_dir /path/to/output \
    --gpu 0

Outputs: overall_results.json, val_mrn.csv

Optional: Evaluate CLMBR-T Representations

python /clmbrt-mapping/inference/run_zero_shot_clmbrt.py/run_zero_shot_clmbrt.py

Part V: Experiments of EHR-guided vs Untargeted AI-ECG

This module compares 3 strategies:

Approach I (Image Only): ECG embeddings + classifier (F1-optimized)
Approach II (Concatenated): ECG + CLMBR-T embeddings + classifier (F1-optimized)
Approach III (Gated): CLMBR-T gating (90% sens.) --> ECG classifier (F1-optimized)

# Assumes a parquet or other file that contains the image_ and clmbr_embeddings, see script
python ./targeted_vs_untargeted_experiments/targeted_vs_untargeted_aiecg_deployment.py

Output: a csv file containing a table with label-level performance metrics for each strategy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repository for the project "TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record"

Environment Setup

For Quick Inference on New Images

Part I: Defining the Project Cohorts

Part II: CLIP Pre-training of ECG Image ViT and TTE Text Encoder

Part III: CLMBR-T Mapping of EHR

Part IV: Zero-shot Inference

Step 1: Extract Centroids

Step 2: Evaluate on New Data

Optional: Evaluate CLMBR-T Representations

Part V: Experiments of EHR-guided vs Untargeted AI-ECG

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
clmbrt-mapping		clmbrt-mapping
demo		demo
ecg_echo_clip_training		ecg_echo_clip_training
environment_files		environment_files
images		images
targeted_vs_untargeted_experiments		targeted_vs_untargeted_experiments
README.md		README.md

CarDS-Yale/target-ai-shared

Folders and files

Latest commit

History

Repository files navigation

Repository for the project "TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record"

Environment Setup

For Quick Inference on New Images

Part I: Defining the Project Cohorts

Part II: CLIP Pre-training of ECG Image ViT and TTE Text Encoder

Part III: CLMBR-T Mapping of EHR

Part IV: Zero-shot Inference

Step 1: Extract Centroids

Step 2: Evaluate on New Data

Optional: Evaluate CLMBR-T Representations

Part V: Experiments of EHR-guided vs Untargeted AI-ECG

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages