readmes

michaelbornholdt · michaelbornholdt · commit 70bfadf11b52 · 2021-11-24T15:42:42.000-05:00
diff --git a/LINCS_example_data/README.md b/LINCS_example_data/README.md
@@ -5,4 +5,25 @@ By downloading the repo or the tar file of this folder, you have easy access to
 
 The data includes DMSO plates and two compounds over 5 plates. 
 
-There still may be issues with this example data. A stable version of example data can be found in the DP repository. 
+There still may be issues with this example data. A stable version of example data can be found in the DP repository.
+
+## Folders
+
+```commandline
+├── inputs
+│   ├── config
+│   ├── images
+│   │   ├── SQ00015198
+│   │   ├── SQ00015230
+│   │   ├── SQ00015231
+│   │   ├── SQ00015232
+│   │   └── SQ00015233
+│   ├── locations
+│   │   ├── SQ00015198
+│   │   ├── SQ00015230
+│   │   ├── SQ00015231
+│   │   ├── SQ00015232
+│   │   └── SQ00015233
+│   └── metadata
+└── outputs
+```
diff --git a/README.md b/README.md
@@ -12,12 +12,89 @@ In `/training/index` you can find all the index files for the different subsets
 The `/training/runs` folder holds all profiles and training output from each experiment. 
 
 ## Folder structure
+```commandline
+.
+├── LINCS_example_data
+│   ├── inputs
+│   │   ├── config
+│   │   ├── images
+│   │   ├── locations
+│   │   └── metadata
+│   └── outputs
+├── baseline
+│   ├── 01_data
+│   │   ├── level_3_data
+│   │   └── level_5_data
+│   ├── 02_analysis
+│   └── thesis
+├── chtc
+│   ├── DP_0.3.0
+│   │   ├── aggregate
+│   │   ├── checking
+│   │   ├── profile
+│   │   ├── sampling
+│   │   └── train
+│   ├── helper_functions
+│   └── old_DP
+│       ├── aggregate
+│       ├── checking
+│       ├── exporting
+│       ├── profile
+│       └── train
+├── docker
+│   ├── 0.3.0
+│   └── old_versions
+├── hit_k
+├── pre-trained
+│   ├── ResNet50v2
+│   │   ├── aggregated
+│   │   └── post_processing
+│   ├── data-prep
+│   │   ├── 01_location_extraction
+│   │   └── 02_index_preperation
+│   ├── efficient_net
+│   │   ├── aggregated
+│   │   └── post_processing
+│   └── thesis
+├── training
+│   ├── aggregation
+│   ├── index
+│   │   └── sc-metadata
+│   ├── prediction_analysis
+│   │   └── 819
+│   ├── results
+│   │   └── accuracy
+│   └── runs
+│       ├── 1003
+        ... 
+│       └── 931
+└── utils
+```
 
+## Description of the repository content
+### `basline/`
+The first part of the project gathers CellProfiler profiles from the LINCS repository and compares them. 
+A general overview of the data and of the subselection is found here! 
+If you want to compare metrics with my data, you need to follow the steps in `baseline/02_analysis/02_clean_data.ipynb`.
 
-## Important random things
-- Some information in this repository may be old since the DeepProfiler versions changed midway through the project
-- The single cell crops of all 18 million cells within the LINCS subsection can be found on S3: `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/outputs/1017_sc/`
-- If you can't reach Michael Bornholdt, try to reach Shantanu Singh. 
+### `pre-trained/`
+The two pre-trained nets are compared here and create the baseline for the trained neural networks. 
+The best pipeline for deep learning features is determined.
+
+### `training/`
+All experiments live here. 
+The experiments are different models trained with different hyperparameters and data. 
+A full analysis of the resulting profiles can be found in the `training/results/` folder. 
+
+### `chtc/` and `docker/`
+These folders hold important scripts for setting up and running DeepProfiler on a server.
+
+### `hit_k`
+This folder contains the development code of the hit@k metric. Now on Cyto-eval/ 
+
+### `LINCS_example_data/`
+A small subsection of the LINCS data allows to test and learn DP. 
+Alternatively used example data from DP Github.
 
 
 ## Experimental data on S3
@@ -32,3 +109,10 @@ The `/training/runs` folder holds all profiles and training output from each exp
 - models: `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/models/`
 - LINCS subsets crop (18 million): `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/crops/`
 
+
+
+## Important random things
+- Some information in this repository may be old since the DeepProfiler versions changed midway through the project
+- The single cell crops of all 18 million cells within the LINCS subsection can be found on S3: `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/outputs/1017_sc/`
+- If you can't reach Michael Bornholdt, try to reach Shantanu Singh. 
+
diff --git a/baseline/README.md b/baseline/README.md
@@ -15,3 +15,33 @@ This folder contains all notebooks. Most importantly the data is cleaned and sel
 ## Thesis
 This is a collection of further analysis notebooks and plots. 
 
+## Folder structure
+```commandline
+.
+├── 01_data
+│   ├── full_level3.csv
+│   ├── level3.ipynb
+│   ├── level3_featselected_500_nadropped.csv
+│   ├── level_3_data
+│   │   └── sub_level3.csv
+│   └── level_5_data
+│       └── 2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_dmso_consensus_median.csv.gz
+├── 02_analysis
+│   ├── 01_data_insights.ipynb
+│   ├── 02_clean_data.ipynb
+│   ├── 03_enrichment_demo.ipynb
+│   ├── 03_precision_demo.ipynb
+│   ├── compare_consensus.ipynb
+│   ├── examples_images.ipynb
+│   └── level3_data_insights.ipynb
+├── README.md
+└── thesis
+    ├── 00_baseline_calc.ipynb
+    ├── 00_baselines_plots.ipynb
+    ├── 0_PCA_visualizatino.ipynb
+    ├── 0_example_hitk.ipynb
+    ├── 0_precision_doof.ipynb
+    ├── compare_metrics.ipynb
+    ├── meta.csv
+    └── subselection_performance.ipynb
+```
diff --git a/docker/README.md b/docker/README.md
@@ -23,7 +23,23 @@ RUN unzip awscliv2.zip
 RUN ./aws/install
 ```
 
-### Legacy 
+## Structure
+
+```commandline
+├── 0.3.0
+│   ├── Dockerfile
+│   └── Makefile
+├── README.md
+└── old_versions
+    ├── tf15
+    │   ├── Dockerfile
+    │   └── Makefile
+    └── tf2
+        ├── Dockerfile
+        └── Makefile
+```
+
+## Legacy 
 
 Until October 2021 DP was run on Tensorflow 1.5. 
 Some older Docker images thus on TF 1.5. 
diff --git a/pre-trained/README.md b/pre-trained/README.md
@@ -5,21 +5,54 @@ Read the documentation of DeepProfiler (DP) to understand the following.
 ## Folder structure
 ```commandline
 .
+├── README.md
 ├── ResNet50v2
 │   ├── aggregated
+│   │   ├── aggregate.ipynb
+│   │   ├── aggregated_resnet_median.csv
+│   │   ├── full_well_index.csv
+│   │   ├── level3_resnet.csv
+│   │   ├── raw.csv
+│   │   └── testing_resnet_output.ipynb
 │   └── post_processing
+│       ├── Compare_eff_res_cp.ipynb
+│       └── normalization.ipynb
 ├── data-prep
 │   ├── 01_location_extraction
+│   │   ├── README.md
+│   │   ├── extract.py
+│   │   ├── location_test.ipynb
+│   │   └── split.py
 │   └── 02_index_preperation
+│       ├── 00_create_index.ipynb
+│       ├── 01_enrich_index.ipynb
+│       ├── 02_clean_index.ipynb
+│       ├── README.md
+│       ├── barcode_platemap.csv
+│       ├── enriched_index.csv
+│       ├── full_index.csv
+│       ├── full_well_index.csv
+│       ├── original_index.csv
+│       ├── repurposing_info_external_moa_map_resolved.tsv
+│       └── sub_index.csv
 ├── efficient_net
 │   ├── aggregated
-│   │   └── index
-│   ├── post_processing
-│   │   ├── __pycache__
-│   │   ├── images
-│   │   └── old
-│   └── results
+│   │   ├── README.md
+│   │   ├── aggregated_efficientnet_median.csv
+│   │   ├── run_aggregation.py
+│   │   └── testing_efficientnet.ipynb
+│   └── post_processing
+│       ├── PCA_plots.ipynb
+│       ├── consensus_spherized_dmso_eff_mean.csv
+│       ├── efficientnet_full_analysis.ipynb
+│       └── normalization.ipynb
 └── thesis
+    ├── 0_PCA_visualizatino.ipynb
+    ├── 0_compare_arch.ipynb
+    ├── 0_normalization.ipynb
+    ├── batch_effect.ipynb
+    ├── random_subsets.ipynb
+    └── subselection_performance.ipynb
 ```
 ResNet50v2 and efficient_net are the folders dealing with the profiles of both per-trained nets.
 Thesis holds some high level analyses. 
diff --git a/training/README.md b/training/README.md