Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

Open
AdeboyeML opened this issue Oct 24, 2020 · 15 comments

Comments

@AdeboyeML
Copy link
Collaborator

@gwaygenomics @shntnu

The goal here was to determine the size/value of each MOA (Mechanism of action) for each dose based on taking the median of the correlation values between compounds of the same MOA in the consensus datasets.

MOAs with only one compound were excluded. Out of 601 MOAs, 369 were excluded.

Results: I will be showing only the first 10 MOAs values per dose for all the consensus datasets and the heatmap plots of the 232 MOAs for each of the consensus datasets. Doses 0 (has only dmso) and 7 (has only two MOAs) were excluded from the analysis.

1a. Median Aggregation Consensus dataset - consensus_median (whole plate normalization).

cons_med

Heatmap plots -- I split the dataframe into 3 parts for easier visualization:

median_heatmap_1

median_heatmap_2

median_heatmap_3

1b. Median Aggregation Consensus dataset - consensus_median_dmso (dmso normalization).

cons_med_dmso

heatmap plots

median_dmso_heatmap_1
median_dmso_heatmap_2
median_dmso_heatmap_3

2a. Modified Z Score Aggregation (MODZ) dataset - consensus_modz (whole plate normalization).

image

heatmap plots

modz_heatmap_1
modz_heatmap_2
modz_heatmap_3

2b. Modified Z Score Aggregation (MODZ) dataset - consensus_modz_dmso (dmso normalization).

image

heatmap plots

modz_dmso_heatmap_1
modz_dmso_heatmap_2
modz_dmso_heatmap_3

@AdeboyeML
Copy link
Collaborator Author

@gwaygenomics

- MOAS that do not have the same number of compounds in all Doses

image
image

@shntnu
Copy link
Collaborator

shntnu commented Oct 27, 2020

@AdeboyeML and I inspected one of the cases and found that the dose remapping can be flawed. E.g. PYM50028 (BRD-K62277907-001-01-6) has two doses coded as level 1

image

@shntnu
Copy link
Collaborator

shntnu commented Oct 27, 2020

@AdeboyeML uses corr_val = abs(df_dose_corr.loc[cpds[y], cpds[x]]) but we shouldn't take the absolute value. We may have compounds in there that are negatively correlated and that should count against the MOA.

Also consider computing the correlation matriix of the subsetted dataframe corresponding to the replicates you care about and then take the median of the lower (or upper triangular matrix). This is an implementation detail, but the logic is otherwise correct.

@AdeboyeML
Copy link
Collaborator Author

AdeboyeML commented Oct 27, 2020

just an update to #2 (comment)

- MOAS that do not have the same number of compounds in all Doses

image
image

@AdeboyeML
Copy link
Collaborator Author

AdeboyeML commented Nov 4, 2020

@gwaygenomics @shntnu

- Results from Null Distribution

Major points:

  • Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.

  • In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.

  • A P value was computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.

- Visualization: non-parametric p-value vs median pairwise correlation score (for each MOA) per dose

- Median Consensus

image

- Median DMSO Consensus

image

- MODZ Consensus

image

- MODZ DMSO Consensus

image

  • We also checked MOAs with less than (0.05) 5% probability of obtaining median correlation score from the null distribution that is greater than or equal to the MOAs median correlation score in all doses (1-6).

  • 0.05 was said to be the significant level

- Median Consensus -MOAs with p-values <0.05 in all doses

image

- Median DMSO Consensus -MOAs with p-values <0.05 in all doses

image

- MODZ Consensus -MOAs with p-values <0.05 in all doses

image

- MODZ DMSO Consensus -MOAs with p-values <0.05 in all doses

image

@shntnu

This comment has been minimized.

@AdeboyeML
Copy link
Collaborator Author

- In addition to the figures shown in #2 (comment), I included the density distribution of the p-values vs median scores in the same plots:

- - Median Consensus

image

- - Median DMSO Consensus

image

- MODZ Consensus

image

- MODZ DMSO Consensus

image

@shntnu
Copy link
Collaborator

shntnu commented Nov 10, 2020

@AdeboyeML quick question:

In this figure (where each data points is an MOA, I assume), if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

image

@shntnu
Copy link
Collaborator

shntnu commented Nov 10, 2020

@AdeboyeML kudos for making the data so easy to peek into. I was curious to see if one could see a dose-response in some of the MOAs. Looks like we do in a few.

More during profiling check-in!

moa_score_dose_response

Code
moa_consistency <-
  read_csv(
    "https://raw.githubusercontent.com/broadinstitute/lincs-profiling-comparison/9bc5db8167674e2c8bec5cee3fcc043117acfbf6/1.Data-exploration/moa_sizes_consensus_datasets/median_dmso_moa_median_scores.csv"
  )

moa_consistency %<>% rename(moa = X1)

moa_consistency %<>% pivot_longer(-moa, names_to = "dose", values_to = "score")

moa_consistency %<>% mutate(dose = as.integer(str_remove(dose, "dose_")))

moa_consistency %<>%
  inner_join(
    moa_consistency %>%
      group_by(moa) %>%
      summarize(score_median = median(score)) %>%
      filter(score_median > 0.30)
  )

p <-
  ggplot(moa_consistency, aes(dose, score)) +
  geom_line() +
  facet_wrap( ~ round(score_median, 2) ~ moa,
              ncol = 5,
              scales = "free_y")

ggsave("~/Desktop/moa_score_dose_response.png",
       width = 10,
       height = 10)

@AdeboyeML
Copy link
Collaborator Author

@shntnu In regards to question in #2 (comment)

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

  • I computed the null distribution once for each MOA based on its Size (i.e. by size - number of compounds in a MOA class) per Dose, this means I computed 1000 random median scores for each MOA per Dose from which I computed the p-value.

  • Null Distribution for each MOA means - I selected a number of random compounds from different MOAs for each MOA, in which this number is the size of the MOA, I did this 1000 times for each MOA per dose.

if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

  • No, the p-value (y-axis) is based on randomly selected 1000 lists of compounds (the length of each list is the size of the MOA) from which I computed the median scores for each MOA, which is unique to each MOA.

@shntnu
Copy link
Collaborator

shntnu commented Nov 11, 2020

@AdeboyeML thanks for clarifying. Everything looks good, but the one change I recommend is to use the same null distribution for all MOAs of the same size.

There is no upside to having different null distributions for each unique MOA (of the same size), while it has the downside of adding uninformative variance to the p-value estimates.

cc @gwaygenomics

@AdeboyeML
Copy link
Collaborator Author

AdeboyeML commented Nov 12, 2020

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size.

  • There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

Median Consensus

image

Distribution of the median pairwise correlation scores

image

image

P-values distribution across doses

Increase in MOAs with values below the significant level (0.05) as dose increases

image

MOAs with p-values <0.05 in all doses

image

These MOAs dose responses:

image

The above results and distributions are similar for the Modz Consensus datasets

@shntnu
Copy link
Collaborator

shntnu commented Nov 12, 2020

  • There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

That's really strange – sounds like a bug to me

@AdeboyeML

This comment has been minimized.

@AdeboyeML
Copy link
Collaborator Author

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size. (L1000 & Cell Painting)

Cell painting

MODZ Consensus

image

L 1000

MODZ Level-5 data

image

- The above results and distributions are similar for both median and rank level-5 (Consensus data) in cell painting and L1000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants