Repository for the project "TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record"
Authors: Evangelos K. Oikonomou, Bruno Batinica, Lovedeep S. Dhingra, Arya Aminorroaya, Andreas Coppi, Rohan Khera
Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA. Cardiovascular Data Science (CarDS) Lab, Yale School of Medicine, New Haven, CT, USA.
This is a public code repository for the project "TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record" Parts of this code have been deidentified to prevent references to potentially identifiable or health system-specific information.
Model weights for the pre-trained ECG image vision transformer (ViT) can be accessed on: https://huggingface.co/CarDSLab/ecg-clip-beit-base-384.
# first clone the repo and navigate to the local folder
cd target-ai-shared
conda env create -f ./environment_files/ecg_ehr_clip_env.yml # heavy environment with all packages
conda env create -f ./environment_files/ecg_image_vit_light.yml # light environment with only packages for inferenceYou can run quick inference on a folder of ECG images by accessing the model's weights on HuggingFace. In the command below, the following can be adjusted:
- hf_repo: this points to the HF repo of the published model and weights
- image_dir: point to a folder containing ECG images in the 4 standard layouts - as shown below
- centroid_csv: point to the csv containing reference embeddings/centroids for cases/controls - we provide examples from our training set and EchoNext (Elias, P., & Finer, J. (2025). EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs (version 1.1.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/3ykd-bf14)
- output_csv: path where the output will be stored
conda activate ecg_image_vit_light
python ./demo/zero_shot_from_reference_embedding.py \
--hf_repo "CarDSLab/ecg-clip-beit-base-384" \
--image_dir "./demo/online_ecg_images_credit_to_liftl" \
--centroid_csv "./demo/reference_embeddings/ynhhs_reference_centroids.csv" \
--output_csv "./demo/output/ynhhs_output.csv" \
--batch_size 32 \
--device cudaThe manuscript includes pairs of ECG-TTE studies performed within 90 days of each other. The inclusion and exclusion criteria are described in the accompanying manuscript. Briefly:
- Development cohort: 2016 to 2021
- Temporally distinct internal test set: 2022 to 2023
conda activate ecg_ehr_clip_env
python /ecg_echo_clip_training/main_run_experiments.pyEach training run produces:
Tokenizer and Configuration Files:
added_tokens.json,config.json,merges.txt,preprocessor_config.json,special_tokens_map.json,tokenizer_config.json,tokenizer.json,vocab.json
Model Checkpoints:
best_model.pth,final_model.pth,epoch_10.pth, ...
Processor and Metrics:
processor_initial/,best_processor/,metrics.txt
Data Files:
train_data.csv,val_data.csv,frequent_words.txt
Output files are saved under hyperparameter-specific subdirectories in output_dir.
This section is based on JDAT extraction from the Yale EHR.
# Stepwise preprocessing
python /clmbrt-mapping/ynhhspreprocess_ynhhs_population.py
python /clmbrt-mapping/preprocess_ynhhs_labs.py
python /clmbrt-mapping/preprocess_ynhhs_meds.py
python /clmbrt-mapping/preprocess_ynhhs_pool_events.py
python /clmbrt-mapping/preprocess_split_parquets.py
# Parallel representation generation
/clmbrt-mapping/ynhhs/run_parallel.sh
# Merge outputs
python /clmbrt-mapping/combine_representation_chunks.py
python /clmbrt-mapping/combine_ecg_echo_clmbrt_embeddings.pyFinal output: combined_ecg_echo_clmbr_embeddings_YYYYMMDD.parquet
For UKB, see analogous scripts under /clmbrt-mapping/ukb
This module performs zero-shot evaluation using image embeddings and centroids.
python /ecg_echo_clip_training/extract_embeddings.pyInputs: train_image_embeddings.parquet, train_labels.csv
Output: centroid_results.csv
python ecg_echo_clip_training/zero_shot_from_reference_embeddings.py \
--val_data_path /path/to/val.csv \
--train_data_path /path/to/train.csv \
--embeddings_path /path/to/clip_embeddings \
--output_dir /path/to/output \
--gpu 0Outputs: overall_results.json, val_mrn.csv
python /clmbrt-mapping/inference/run_zero_shot_clmbrt.py/run_zero_shot_clmbrt.pyThis module compares 3 strategies:
- Approach I (Image Only): ECG embeddings + classifier (F1-optimized)
- Approach II (Concatenated): ECG + CLMBR-T embeddings + classifier (F1-optimized)
- Approach III (Gated): CLMBR-T gating (90% sens.) --> ECG classifier (F1-optimized)
# Assumes a parquet or other file that contains the image_ and clmbr_embeddings, see script
python ./targeted_vs_untargeted_experiments/targeted_vs_untargeted_aiecg_deployment.pyOutput: a csv file containing a table with label-level performance metrics for each strategy


