This step-by-step protocol details how the analyses in Fig. 5 using compound perturbation data can be reproduced after obtaining results in the previous sections. To quickly test our Image2Reg pipeline or use it to perform inference on your own data set, please refer to respective documentation described in our ReadMe file.
Start the jupyter server in the conda environment via
conda activate image2reg
jupyter notebook
Start the jupyter notebook notebooks/jump/eda/data_extraction_new_compound.ipynb and run all cells to download the image data from the JUMP-CP data set for the selected OE conditions including the illumination corrected images. This script also maps compounds to their known gene targets using DrugBank.
All generated data gets downloaded to data/resources/images/jump_compound.
Start the jupyter server in the unet conda environment via
conda activate unet
jupyter notebook
Open and run the jupyter notebook located in unet/notebook/jump_segmentation.ipynb.
Important
Please be aware that this is not a path in the image2reg directory but the unet-nuclei directory you have cloned earlier. Please refer to the respective section for the Rohban data set in this protocol for more information.
Run all cells to generate the segmentation masks for all images and stores those in image2reg/data_new/resources/images/jump_compound/unet_masks.
Run the preprocessing script via
conda activate image2reg
python run.py --config config/preprocessing/full_image_pipeline_jump_new_compound.yml
python run.py --config config/preprocessing/full_image_pipeline_jump_new_ko.yml
This runs all preprocessing steps stores the outputs in a "timestamp" output directory in the directory data_new/experiments/jump_compound/images/preprocessing/full_pipeline.
By default all output directories created as a result of running the run.py will be named after the time point when the script was started.
For the consecutive analyses please copy the content of the timestamp output directory directly to the full_pipeline directory and delete the then empty timestamp directory.
Start the jupyter server in the conda environment via
conda activate image2reg
jupyter notebook
Start the notebook notebooks/rohban/other/cv_screen_data_split_jump_new_impactful.ipynb and run all cells in the notebook.
This generates a number of files located in data_new/experiments/jump/images/preprocessing/specific_targets_combined_excludeCompound/screen_splits_impactful.
Run the specificity screen to identify impactul overexpression conditions via
conda activate image2reg
bash run_screen_ko.sh
Finally, rename the output of the screen located in data/experiments/jump/images/screen/nuclei_region via
conda activate image2reg
python scripts/experiments/rename_screen_dirs –root_dir data/experiments/jump/images/screen/nuclei_region
Start the jupyter server in the conda environment
conda activate image2reg
jupyter notebook
Start the notebook notebooks/jump/screen/screen_analyses_cv_final.ipynb and run all cells.
This creates a summary of the screen results and saves it as data_new/experiments/jump/images/screen/nuclei_region/specificity_screen_results_cv.csv.
Start the jupyter server in the conda environment via
conda activate image2reg
jupyter notebook
Start the notebook notebooks/rohban/other/cv_specific_targets_data_split_jump_new_combined_excludeCompound.ipynb and run all cells.
This creates the required metadata csv-files in data_new/experiments/jump/images/preprocessing/specific_targets_combined_excludeCompound/.
Run the training of the CNN ensemble on the image data from the JUMP data set in the conda environment via
conda activate image2reg
python run.py --config config/image_embedding/specific_targets/jump_ko_compound/compound_test_excludeCompound.yml
python run.py --config config/image_embedding/specific_targets/jump_ko_compound/compound_test_others_excludeCompound.yml
This step also saves the single-cell image embeddings of compound perturbations.
The results of the analyses are saved in the directory data_new/experiments/jump/images/embedding/specificity_target_combined/all_excludeCompound/.
By default the results are saved in a timestamp subdiretory.
Rename the timestamp directory to nuclei_regions.
Start the jupyter server in the conda environment via
conda activate image2reg
jupyter notebook
Start the notebook notebooks/jump/ko_embedding/image_embeddings_analysis_combined_excludeCompound.ipynb and run all cells
This produces the e.g. the Fig. 5C of the manuscript.
Next, start the notebook notebooks/jump/ko_embedding/gene_perturbation_cluster_analysis_excludeCompound.ipynb and run all cells to e.g. reproduce the Fig. 5B of the manuscript.
Start the jupyter server in the conda environment via
conda activate image2reg
jupyter notebook
Start the notebook notebooks/jump/translation/jump_translation_prediction_new_compound_excludeCompound.ipynb and run all cells to perform the complete translation analysis and e.g.generate Fig. 5D-E.