Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a barplot of the summarizeFasta results #734

Merged
merged 5 commits into from
May 17, 2023
Merged

Conversation

zhuchcn
Copy link
Member

@zhuchcn zhuchcn commented May 4, 2023

Description

The summarizeFasta now can accept a --output-image arg and creates a barplot of the summary results. This is an example:

Any thought?

Closes #...

Checklist

  • This PR does NOT contain PHI or germline genetic data. A repo may need to be deleted if such data is uploaded. Disclosing PHI is a major problem.
  • This PR does NOT contain molecular files, compressed files, output files such as images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other non-plain-text files. To automatically exclude such files using a .gitignore file, see here for example.
  • I have read the code review guidelines and the code review best practice on GitHub check-list.
  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
  • I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
  • All test cases passed locally.

@zhuchcn zhuchcn requested a review from lydiayliu May 4, 2023 08:46
@lydiayliu
Copy link
Collaborator

The example output looks fine. It worries me if "Noncoding" is also plotted cuz then you would literally only be able to see that 1 bar. Maybe it's worth adding a log10 if Noncoding is plotted (and since stacked no longer works just use one colour)?

Is the current sorting order the same as in the summarizeFasta text file?

@zhuchcn
Copy link
Member Author

zhuchcn commented May 15, 2023

Log scaling sounds reasonable but it will make the stacked barplot hard to interpret, because the stacked bars are no longer proportional.

Is the current sorting order the same as in the summarizeFasta text file?

They should be in the same order

@lydiayliu
Copy link
Collaborator

Log scaling sounds reasonable but it will make the stacked barplot hard to interpret, because the stacked bars are no longer proportional.

Yeah so if we do log scale we can just do a single colour for the bars :) (aka not show the miscleavages)

@zhuchcn
Copy link
Member Author

zhuchcn commented May 16, 2023

I added a pair of argument --plot-normal-scale/--plot-log-scale to set what scale the plot should use. If neither arg is given, it will figure out what's the best. I'm doing it by comparing the mean and median of number of peptides in each source group. So use log scale if mean is way larger than median.

This is what it looks like. Scale looks strand because it's small.

image

Copy link
Collaborator

@lydiayliu lydiayliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! The only thing I can think of is whether we should exclude the entries with 0 peptides? I think it's ok eitherway (I don't think we are excluding any now)

moPepGen/aa/PeptidePoolSummarizer.py Show resolved Hide resolved
@zhuchcn
Copy link
Member Author

zhuchcn commented May 17, 2023

The only thing I can think of is whether we should exclude the entries with 0 peptides? I think it's ok eitherway (I don't think we are excluding any now)

Let's leave it in this way for now. Will be good to see some examples.

@zhuchcn zhuchcn merged commit 3d4f7e9 into main May 17, 2023
@zhuchcn zhuchcn deleted the czhu-add-plot-summary branch May 17, 2023 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants