Skip to content

Commit 70bfadf

Browse files
readmes
1 parent 63f9088 commit 70bfadf

File tree

6 files changed

+288
-12
lines changed

6 files changed

+288
-12
lines changed

LINCS_example_data/README.md

+22-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,25 @@ By downloading the repo or the tar file of this folder, you have easy access to
55

66
The data includes DMSO plates and two compounds over 5 plates.
77

8-
There still may be issues with this example data. A stable version of example data can be found in the DP repository.
8+
There still may be issues with this example data. A stable version of example data can be found in the DP repository.
9+
10+
## Folders
11+
12+
```commandline
13+
├── inputs
14+
│   ├── config
15+
│   ├── images
16+
│   │   ├── SQ00015198
17+
│   │   ├── SQ00015230
18+
│   │   ├── SQ00015231
19+
│   │   ├── SQ00015232
20+
│   │   └── SQ00015233
21+
│   ├── locations
22+
│   │   ├── SQ00015198
23+
│   │   ├── SQ00015230
24+
│   │   ├── SQ00015231
25+
│   │   ├── SQ00015232
26+
│   │   └── SQ00015233
27+
│   └── metadata
28+
└── outputs
29+
```

README.md

+88-4
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,89 @@ In `/training/index` you can find all the index files for the different subsets
1212
The `/training/runs` folder holds all profiles and training output from each experiment.
1313

1414
## Folder structure
15+
```commandline
16+
.
17+
├── LINCS_example_data
18+
│   ├── inputs
19+
│   │   ├── config
20+
│   │   ├── images
21+
│   │   ├── locations
22+
│   │   └── metadata
23+
│   └── outputs
24+
├── baseline
25+
│   ├── 01_data
26+
│   │   ├── level_3_data
27+
│   │   └── level_5_data
28+
│   ├── 02_analysis
29+
│   └── thesis
30+
├── chtc
31+
│   ├── DP_0.3.0
32+
│   │   ├── aggregate
33+
│   │   ├── checking
34+
│   │   ├── profile
35+
│   │   ├── sampling
36+
│   │   └── train
37+
│   ├── helper_functions
38+
│   └── old_DP
39+
│   ├── aggregate
40+
│   ├── checking
41+
│   ├── exporting
42+
│   ├── profile
43+
│   └── train
44+
├── docker
45+
│   ├── 0.3.0
46+
│   └── old_versions
47+
├── hit_k
48+
├── pre-trained
49+
│   ├── ResNet50v2
50+
│   │   ├── aggregated
51+
│   │   └── post_processing
52+
│   ├── data-prep
53+
│   │   ├── 01_location_extraction
54+
│   │   └── 02_index_preperation
55+
│   ├── efficient_net
56+
│   │   ├── aggregated
57+
│   │   └── post_processing
58+
│   └── thesis
59+
├── training
60+
│   ├── aggregation
61+
│   ├── index
62+
│   │   └── sc-metadata
63+
│   ├── prediction_analysis
64+
│   │   └── 819
65+
│   ├── results
66+
│   │   └── accuracy
67+
│   └── runs
68+
│   ├── 1003
69+
...
70+
│   └── 931
71+
└── utils
72+
```
1573

74+
## Description of the repository content
75+
### `basline/`
76+
The first part of the project gathers CellProfiler profiles from the LINCS repository and compares them.
77+
A general overview of the data and of the subselection is found here!
78+
If you want to compare metrics with my data, you need to follow the steps in `baseline/02_analysis/02_clean_data.ipynb`.
1679

17-
## Important random things
18-
- Some information in this repository may be old since the DeepProfiler versions changed midway through the project
19-
- The single cell crops of all 18 million cells within the LINCS subsection can be found on S3: `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/outputs/1017_sc/`
20-
- If you can't reach Michael Bornholdt, try to reach Shantanu Singh.
80+
### `pre-trained/`
81+
The two pre-trained nets are compared here and create the baseline for the trained neural networks.
82+
The best pipeline for deep learning features is determined.
83+
84+
### `training/`
85+
All experiments live here.
86+
The experiments are different models trained with different hyperparameters and data.
87+
A full analysis of the resulting profiles can be found in the `training/results/` folder.
88+
89+
### `chtc/` and `docker/`
90+
These folders hold important scripts for setting up and running DeepProfiler on a server.
91+
92+
### `hit_k`
93+
This folder contains the development code of the hit@k metric. Now on Cyto-eval/
94+
95+
### `LINCS_example_data/`
96+
A small subsection of the LINCS data allows to test and learn DP.
97+
Alternatively used example data from DP Github.
2198

2299

23100
## Experimental data on S3
@@ -32,3 +109,10 @@ The `/training/runs` folder holds all profiles and training output from each exp
32109
- models: `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/models/`
33110
- LINCS subsets crop (18 million): `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/crops/`
34111

112+
113+
114+
## Important random things
115+
- Some information in this repository may be old since the DeepProfiler versions changed midway through the project
116+
- The single cell crops of all 18 million cells within the LINCS subsection can be found on S3: `s3://jump-cellpainting/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/outputs/1017_sc/`
117+
- If you can't reach Michael Bornholdt, try to reach Shantanu Singh.
118+

baseline/README.md

+30
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,33 @@ This folder contains all notebooks. Most importantly the data is cleaned and sel
1515
## Thesis
1616
This is a collection of further analysis notebooks and plots.
1717

18+
## Folder structure
19+
```commandline
20+
.
21+
├── 01_data
22+
│   ├── full_level3.csv
23+
│   ├── level3.ipynb
24+
│   ├── level3_featselected_500_nadropped.csv
25+
│   ├── level_3_data
26+
│   │   └── sub_level3.csv
27+
│   └── level_5_data
28+
│   └── 2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_dmso_consensus_median.csv.gz
29+
├── 02_analysis
30+
│   ├── 01_data_insights.ipynb
31+
│   ├── 02_clean_data.ipynb
32+
│   ├── 03_enrichment_demo.ipynb
33+
│   ├── 03_precision_demo.ipynb
34+
│   ├── compare_consensus.ipynb
35+
│   ├── examples_images.ipynb
36+
│   └── level3_data_insights.ipynb
37+
├── README.md
38+
└── thesis
39+
├── 00_baseline_calc.ipynb
40+
├── 00_baselines_plots.ipynb
41+
├── 0_PCA_visualizatino.ipynb
42+
├── 0_example_hitk.ipynb
43+
├── 0_precision_doof.ipynb
44+
├── compare_metrics.ipynb
45+
├── meta.csv
46+
└── subselection_performance.ipynb
47+
```

docker/README.md

+17-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,23 @@ RUN unzip awscliv2.zip
2323
RUN ./aws/install
2424
```
2525

26-
### Legacy
26+
## Structure
27+
28+
```commandline
29+
├── 0.3.0
30+
│   ├── Dockerfile
31+
│   └── Makefile
32+
├── README.md
33+
└── old_versions
34+
├── tf15
35+
│   ├── Dockerfile
36+
│   └── Makefile
37+
└── tf2
38+
├── Dockerfile
39+
└── Makefile
40+
```
41+
42+
## Legacy
2743

2844
Until October 2021 DP was run on Tensorflow 1.5.
2945
Some older Docker images thus on TF 1.5.

pre-trained/README.md

+39-6
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,54 @@ Read the documentation of DeepProfiler (DP) to understand the following.
55
## Folder structure
66
```commandline
77
.
8+
├── README.md
89
├── ResNet50v2
910
│   ├── aggregated
11+
│   │   ├── aggregate.ipynb
12+
│   │   ├── aggregated_resnet_median.csv
13+
│   │   ├── full_well_index.csv
14+
│   │   ├── level3_resnet.csv
15+
│   │   ├── raw.csv
16+
│   │   └── testing_resnet_output.ipynb
1017
│   └── post_processing
18+
│   ├── Compare_eff_res_cp.ipynb
19+
│   └── normalization.ipynb
1120
├── data-prep
1221
│   ├── 01_location_extraction
22+
│   │   ├── README.md
23+
│   │   ├── extract.py
24+
│   │   ├── location_test.ipynb
25+
│   │   └── split.py
1326
│   └── 02_index_preperation
27+
│   ├── 00_create_index.ipynb
28+
│   ├── 01_enrich_index.ipynb
29+
│   ├── 02_clean_index.ipynb
30+
│   ├── README.md
31+
│   ├── barcode_platemap.csv
32+
│   ├── enriched_index.csv
33+
│   ├── full_index.csv
34+
│   ├── full_well_index.csv
35+
│   ├── original_index.csv
36+
│   ├── repurposing_info_external_moa_map_resolved.tsv
37+
│   └── sub_index.csv
1438
├── efficient_net
1539
│   ├── aggregated
16-
│   │   └── index
17-
│   ├── post_processing
18-
│   │   ├── __pycache__
19-
│   │   ├── images
20-
│   │   └── old
21-
│   └── results
40+
│   │   ├── README.md
41+
│   │   ├── aggregated_efficientnet_median.csv
42+
│   │   ├── run_aggregation.py
43+
│   │   └── testing_efficientnet.ipynb
44+
│   └── post_processing
45+
│   ├── PCA_plots.ipynb
46+
│   ├── consensus_spherized_dmso_eff_mean.csv
47+
│   ├── efficientnet_full_analysis.ipynb
48+
│   └── normalization.ipynb
2249
└── thesis
50+
├── 0_PCA_visualizatino.ipynb
51+
├── 0_compare_arch.ipynb
52+
├── 0_normalization.ipynb
53+
├── batch_effect.ipynb
54+
├── random_subsets.ipynb
55+
└── subselection_performance.ipynb
2356
```
2457
ResNet50v2 and efficient_net are the folders dealing with the profiles of both per-trained nets.
2558
Thesis holds some high level analyses.

0 commit comments

Comments
 (0)