Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

AdeboyeML · 2020-10-24T20:55:27Z

@gwaygenomics @shntnu

The goal here was to determine the size/value of each MOA (Mechanism of action) for each dose based on taking the median of the correlation values between compounds of the same MOA in the consensus datasets.

MOAs with only one compound were excluded. Out of 601 MOAs, 369 were excluded.

Results: I will be showing only the first 10 MOAs values per dose for all the consensus datasets and the heatmap plots of the 232 MOAs for each of the consensus datasets. Doses 0 (has only dmso) and 7 (has only two MOAs) were excluded from the analysis.

1a. Median Aggregation Consensus dataset - consensus_median (whole plate normalization).

Heatmap plots -- I split the dataframe into 3 parts for easier visualization:

1b. Median Aggregation Consensus dataset - consensus_median_dmso (dmso normalization).

heatmap plots

2a. Modified Z Score Aggregation (MODZ) dataset - consensus_modz (whole plate normalization).

heatmap plots

2b. Modified Z Score Aggregation (MODZ) dataset - consensus_modz_dmso (dmso normalization).

heatmap plots

AdeboyeML · 2020-10-27T14:59:26Z

@gwaygenomics

- MOAS that do not have the same number of compounds in all Doses

shntnu · 2020-10-27T15:27:28Z

@AdeboyeML and I inspected one of the cases and found that the dose remapping can be flawed. E.g. PYM50028 (BRD-K62277907-001-01-6) has two doses coded as level 1

shntnu · 2020-10-27T15:34:59Z

@AdeboyeML uses corr_val = abs(df_dose_corr.loc[cpds[y], cpds[x]]) but we shouldn't take the absolute value. We may have compounds in there that are negatively correlated and that should count against the MOA.

Also consider computing the correlation matriix of the subsetted dataframe corresponding to the replicates you care about and then take the median of the lower (or upper triangular matrix). This is an implementation detail, but the logic is otherwise correct.

AdeboyeML · 2020-10-27T18:36:56Z

just an update to #2 (comment)

Dose 7 was not supposed to be included in the Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2 (comment) figure since it has only two MOAs, that was a bug in my code.

- MOAS that do not have the same number of compounds in all Doses

AdeboyeML · 2020-11-04T02:02:25Z

@gwaygenomics @shntnu

- Results from Null Distribution

Major points:

Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.
In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.
A P value was computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.

- Visualization: non-parametric p-value vs median pairwise correlation score (for each MOA) per dose

- Median Consensus

- Median DMSO Consensus

- MODZ Consensus

- MODZ DMSO Consensus

We also checked MOAs with less than (0.05) 5% probability of obtaining median correlation score from the null distribution that is greater than or equal to the MOAs median correlation score in all doses (1-6).
0.05 was said to be the significant level

- Median Consensus -MOAs with p-values <0.05 in all doses

- Median DMSO Consensus -MOAs with p-values <0.05 in all doses

- MODZ Consensus -MOAs with p-values <0.05 in all doses

- MODZ DMSO Consensus -MOAs with p-values <0.05 in all doses

AdeboyeML · 2020-11-09T08:03:37Z

- In addition to the figures shown in #2 (comment), I included the density distribution of the p-values vs median scores in the same plots:

- - Median Consensus

- - Median DMSO Consensus

- MODZ Consensus

- MODZ DMSO Consensus

shntnu · 2020-11-10T20:00:12Z

@AdeboyeML quick question:

In this figure (where each data points is an MOA, I assume), if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

shntnu · 2020-11-10T20:59:40Z

@AdeboyeML kudos for making the data so easy to peek into. I was curious to see if one could see a dose-response in some of the MOAs. Looks like we do in a few.

More during profiling check-in!

Code

moa_consistency <-
  read_csv(
    "https://raw.githubusercontent.com/broadinstitute/lincs-profiling-comparison/9bc5db8167674e2c8bec5cee3fcc043117acfbf6/1.Data-exploration/moa_sizes_consensus_datasets/median_dmso_moa_median_scores.csv"
  )

moa_consistency %<>% rename(moa = X1)

moa_consistency %<>% pivot_longer(-moa, names_to = "dose", values_to = "score")

moa_consistency %<>% mutate(dose = as.integer(str_remove(dose, "dose_")))

moa_consistency %<>%
  inner_join(
    moa_consistency %>%
      group_by(moa) %>%
      summarize(score_median = median(score)) %>%
      filter(score_median > 0.30)
  )

p <-
  ggplot(moa_consistency, aes(dose, score)) +
  geom_line() +
  facet_wrap( ~ round(score_median, 2) ~ moa,
              ncol = 5,
              scales = "free_y")

ggsave("~/Desktop/moa_score_dose_response.png",
       width = 10,
       height = 10)

AdeboyeML · 2020-11-10T21:42:28Z

@shntnu In regards to question in #2 (comment)

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

I computed the null distribution once for each MOA based on its Size (i.e. by size - number of compounds in a MOA class) per Dose, this means I computed 1000 random median scores for each MOA per Dose from which I computed the p-value.
Null Distribution for each MOA means - I selected a number of random compounds from different MOAs for each MOA, in which this number is the size of the MOA, I did this 1000 times for each MOA per dose.

if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

No, the p-value (y-axis) is based on randomly selected 1000 lists of compounds (the length of each list is the size of the MOA) from which I computed the median scores for each MOA, which is unique to each MOA.

shntnu · 2020-11-11T14:53:30Z

@AdeboyeML thanks for clarifying. Everything looks good, but the one change I recommend is to use the same null distribution for all MOAs of the same size.

There is no upside to having different null distributions for each unique MOA (of the same size), while it has the downside of adding uninformative variance to the p-value estimates.

cc @gwaygenomics

AdeboyeML · 2020-11-12T14:47:24Z

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size.

There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

Median Consensus

Distribution of the median pairwise correlation scores

P-values distribution across doses

Increase in MOAs with values below the significant level (0.05) as dose increases

MOAs with p-values <0.05 in all doses

These MOAs dose responses:

The above results and distributions are similar for the Modz Consensus datasets

shntnu · 2020-11-12T15:03:39Z

There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

That's really strange – sounds like a bug to me

AdeboyeML · 2020-11-19T09:51:01Z

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size. (L1000 & Cell Painting)

I have been able to figure out what was wrong with my code, which resulted in the strange relationship between the p-value and median scores (null distribution) in Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2 (comment)
Corrected Plots of the Null distribution are below:

Cell painting

MODZ Consensus

L 1000

MODZ Level-5 data

- The above results and distributions are similar for both median and rank level-5 (Consensus data) in cell painting and L1000.

This comment has been minimized.

Sign in to view

gwaybio mentioned this issue Nov 10, 2020

Added consensus moa analysis files #1

Merged

This comment has been minimized.

Sign in to view

gwaybio mentioned this issue Nov 18, 2020

Comparing the distribution of median scores between L1000 and Lincs Cell painting Consensus datasets #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

AdeboyeML commented Oct 24, 2020

The goal here was to determine the size/value of each MOA (Mechanism of action) for each dose based on taking the median of the correlation values between compounds of the same MOA in the consensus datasets.

AdeboyeML commented Oct 27, 2020

shntnu commented Oct 27, 2020 •

edited

Loading

shntnu commented Oct 27, 2020 •

edited

Loading

AdeboyeML commented Oct 27, 2020 •

edited

Loading

AdeboyeML commented Nov 4, 2020 •

edited

Loading

This comment has been minimized.

AdeboyeML commented Nov 9, 2020

shntnu commented Nov 10, 2020

shntnu commented Nov 10, 2020 •

edited

Loading

AdeboyeML commented Nov 10, 2020

shntnu commented Nov 11, 2020 •

edited

Loading

AdeboyeML commented Nov 12, 2020 •

edited

Loading

shntnu commented Nov 12, 2020

This comment has been minimized.

AdeboyeML commented Nov 19, 2020

Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

Comments

AdeboyeML commented Oct 24, 2020

The goal here was to determine the size/value of each MOA (Mechanism of action) for each dose based on taking the median of the correlation values between compounds of the same MOA in the consensus datasets.

MOAs with only one compound were excluded. Out of 601 MOAs, 369 were excluded.

1a. Median Aggregation Consensus dataset - consensus_median (whole plate normalization).

Heatmap plots -- I split the dataframe into 3 parts for easier visualization:

1b. Median Aggregation Consensus dataset - consensus_median_dmso (dmso normalization).

heatmap plots

2a. Modified Z Score Aggregation (MODZ) dataset - consensus_modz (whole plate normalization).

heatmap plots

2b. Modified Z Score Aggregation (MODZ) dataset - consensus_modz_dmso (dmso normalization).

heatmap plots

AdeboyeML commented Oct 27, 2020

- MOAS that do not have the same number of compounds in all Doses

shntnu commented Oct 27, 2020 • edited Loading

shntnu commented Oct 27, 2020 • edited Loading

AdeboyeML commented Oct 27, 2020 • edited Loading

just an update to #2 (comment)

AdeboyeML commented Nov 4, 2020 • edited Loading

- Results from Null Distribution

Major points:

- Visualization: non-parametric p-value vs median pairwise correlation score (for each MOA) per dose

- Median Consensus

- Median DMSO Consensus

- MODZ Consensus

- MODZ DMSO Consensus

- Median Consensus -MOAs with p-values <0.05 in all doses

- Median DMSO Consensus -MOAs with p-values <0.05 in all doses

- MODZ Consensus -MOAs with p-values <0.05 in all doses

- MODZ DMSO Consensus -MOAs with p-values <0.05 in all doses

This comment has been minimized.

AdeboyeML commented Nov 9, 2020

- In addition to the figures shown in #2 (comment), I included the density distribution of the p-values vs median scores in the same plots:

- - Median Consensus

- - Median DMSO Consensus

- MODZ Consensus

- MODZ DMSO Consensus

shntnu commented Nov 10, 2020

shntnu commented Nov 10, 2020 • edited Loading

AdeboyeML commented Nov 10, 2020

shntnu commented Nov 11, 2020 • edited Loading

AdeboyeML commented Nov 12, 2020 • edited Loading

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size.

Median Consensus

Distribution of the median pairwise correlation scores

P-values distribution across doses

Increase in MOAs with values below the significant level (0.05) as dose increases

MOAs with p-values <0.05 in all doses

These MOAs dose responses:

The above results and distributions are similar for the Modz Consensus datasets

shntnu commented Nov 12, 2020

This comment has been minimized.

AdeboyeML commented Nov 19, 2020

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size. (L1000 & Cell Painting)

Cell painting

MODZ Consensus

L 1000

MODZ Level-5 data

- The above results and distributions are similar for both median and rank level-5 (Consensus data) in cell painting and L1000.

shntnu commented Oct 27, 2020 •

edited

Loading

shntnu commented Oct 27, 2020 •

edited

Loading

AdeboyeML commented Oct 27, 2020 •

edited

Loading

AdeboyeML commented Nov 4, 2020 •

edited

Loading

shntnu commented Nov 10, 2020 •

edited

Loading

shntnu commented Nov 11, 2020 •

edited

Loading

AdeboyeML commented Nov 12, 2020 •

edited

Loading