Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some comparisons of the LFC results between GI Mapping and gimap #49

Merged
merged 8 commits into from
Jul 29, 2024

Conversation

kweav
Copy link
Collaborator

@kweav kweav commented Jul 23, 2024

This is an Rmd and its rendered html to compare the LFC results from GI Mapping and gimap. This assumes that the CRISPR_score and lfc_adj columns correspond to one another.

@kweav kweav requested a review from cansavvy July 23, 2024 19:35
@kweav
Copy link
Collaborator Author

kweav commented Jul 23, 2024

@cansavvy the R-CMD-check for windows-latest was failing because it didn't have the Achilles_common_essentials.csv file, so I added it. It's not failing because it doesn't have the CCLE_expression.csv file. But that file seems to be in the .gitignore. Should I un-add the Achilles file from github and add it to the list in the .gitignore?

@cansavvy
Copy link
Collaborator

@cansavvy the R-CMD-check for windows-latest was failing because it didn't have the Achilles_common_essentials.csv file, so I added it. It's not failing because it doesn't have the CCLE_expression.csv file. But that file seems to be in the .gitignore. Should I un-add the Achilles file from github and add it to the list in the .gitignore?

It's in the gitignore file because the CCLE_expression.csv is too big to be stored on GitHub or included in an R package.

joindf <- dplyr::full_join(old_lfc_results, gimap_dataset$normalized_log_fc,
by = c("pgRNA_id" = "pg_ids", "rep"),
suffix = c("_old", "_new")) %>%
select(lfc_adj, CRISPR_score, rep) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear lfc_adj is the column in the old data that is actually the CRISPR score? So we are comparing CRISPR scores to CRISPR scores?

Copy link
Collaborator Author

@kweav kweav Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the part I'm not sure about. Looking at the GI_Mapping code, the CRISPR_score column is a rename of the lfc_adj3 column. I use the CRISPR_score column from the GI_Mapping results and compare that to the lfc_adj column from the gimap results. If CRISPR_score from GI_Mapping doesn't correspond to lfc_adj from gimap, which one does? I did some code comparing this afternoon and I think it'll be the lfc_adj2 column from GI_Mapping as it has a similar calculation as lfc_adj in gimap

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah in this case from our data what you want is the CRISPR score which you can get from gimap_dataset$crispr_score if calc_crispr() has been ran.

That should be a more apples to apples comparison.

It would also still be worth comparing precursor lfc's to our lfc's as well but perhaps starting with CRISPR is good start.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm comparing the same precursor lfc's now as of commit e2b0ece

@kweav
Copy link
Collaborator Author

kweav commented Jul 24, 2024

@cansavvy the R-CMD-check for windows-latest was failing because it didn't have the Achilles_common_essentials.csv file, so I added it. It's not failing because it doesn't have the CCLE_expression.csv file. But that file seems to be in the .gitignore. Should I un-add the Achilles file from github and add it to the list in the .gitignore?

It's in the gitignore file because the CCLE_expression.csv is too big to be stored on GitHub or included in an R package.

So should I remove the Achilles_common_essentials.csv file and add it to the .gitignore as well? It's not as large of a file of course.

@cansavvy
Copy link
Collaborator

So should I remove the Achilles_common_essentials.csv file and add it to the .gitignore as well? It's not as large of a file of course.

Yeah I think I included it in there because it wasn't too big but I also like the idea of being consistent. Up to you.

@kweav
Copy link
Collaborator Author

kweav commented Jul 25, 2024

I'm working on comparing the old code and the new code side-by-side for each related chunk....

When I ran this

lfc_df <- dataset %>%
    as.data.frame() %>%
    dplyr::mutate(pg_ids = pg_ids) %>%
    tidyr::pivot_longer(-pg_ids) %>%
    # Adding on metdata
    dplyr::left_join(gimap_dataset$metadata$sample_metadata, by = c("name" = "col_names")) %>%
    dplyr::select(-name) %>%
    tidyr::pivot_wider(values_from = value,
                       names_from = c(timepoints, replicates))

I got this warning message:

Warning messages:
1: Using an external vector in selections was deprecated in tidyselect
1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(timepoints)

  # Now:
  data %>% select(all_of(timepoints))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

2: Using an external vector in selections was deprecated in tidyselect
1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(replicates)

  # Now:
  data %>% select(all_of(replicates))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a great PR to see. I was very worried about these calculations coming out right so glad you are working to verify them!

I'll do a tad more review myself but in the meantime feel free to use these comments. My comments here are really only the most minor of comments.

R/04-normalize.R Outdated
# TODO: This section needs a careful review to make sure it is relfective of the previous code's calculations
values_to = "lfc_adj1") %>%
group_by(rep) %>%
#dplyr::left_join(gimap_dataset$annotation, by = c("pg_ids" = "pgRNA_id")) %>% #annotation is already joined with late_vs_plasmid_df
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to delete things that are unnecessary instead of commenting out.

R/04-normalize.R Outdated
@@ -120,28 +120,28 @@ gimap_normalize <- function(.data = NULL,
dplyr::filter(norm_ctrl_flag == "negative_control") %>%
dplyr::select(pg_ids, dplyr::starts_with("late"))

# TODO: There's one main median that's found by taking the median of the medians?
neg_control_median <- median(apply(neg_control_median_df[, -1], 1, median))
# TODO: They find a median for each rep, so apply across columns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for clarifying this. I couldn't quite determine what kind of median was being calculated.

R/04-normalize.R Outdated
names_from = c(timepoints, replicates))

# Extract only the late columns we'll keep these replicates for calculations later
late_df <- dplyr::select(lfc_df, dplyr::starts_with("late"))
#late_df <- dplyr::select(lfc_df, dplyr::starts_with("late"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to delete if it is unnecessary!

R/04-normalize.R Outdated

# First and second adjustments to LFC
lfc_df <- lfc_df %>%
lfc_df_withAdjs <- late_vs_plasmid_df %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a minor minor comment I'm going to make here but can we do lfc_df_adj so it sticks with the style?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can change that!

R/04-normalize.R Outdated

# Save this at the construct level
gimap_dataset$normalized_log_fc <- lfc_df
gimap_dataset$normalized_log_fc <- lfc_df_withAdjs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samesies for here lfc_df_adj

@kweav
Copy link
Collaborator Author

kweav commented Jul 29, 2024

@cansavvy Is this good to merge now? The R-CMD-check failure appears to be an error on the vignette that's a 403 error during the chunk with a string of steps. Probably the annotate step.

@cansavvy
Copy link
Collaborator

@cansavvy Is this good to merge now? The R-CMD-check failure appears to be an error on the vignette that's a 403 error during the chunk with a string of steps. Probably the annotate step.

The vignette will continue to fail until I update the docker image.

The window failure is a known thing: #48

So yes! We can merge!

@cansavvy cansavvy merged commit 2a8aff1 into main Jul 29, 2024
5 of 7 checks passed
@cansavvy cansavvy deleted the lfc_rev_ki branch July 29, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants