-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions to calculate variants per sample in exomes and genomes #577
Suggestions to calculate variants per sample in exomes and genomes #577
Conversation
:param agg_rare_variants: Stratify by variants which have adj AF <0.1%. | ||
:param suffix: String of arguments to append to name. | ||
:param pass_filters: Filter to variants which pass all variant qc filters. | ||
:param ukb_regions: Filter to variants in UKB regions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if all these should be under pass_filters
, like:
if pass_filters:
arg_dict.update({"pass_filters": hl.len(vmt.filters) == 0})
if ukb_regions:
....
vmt_vep = vep.filter_vep_transcript_csqs( | ||
vmt, | ||
if by_csqs: | ||
ht = filter_vep_transcript_csqs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you or Julia agree on filtering to only MANE select transcripts? I leave it the same for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that was just the default. Let me parametrize this in the coming review though
@matren395 Back to you now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some more work to do this evening, let me know how this looks
@matren395 Back to you again, I deleted two unnecessary arguments and implemented your suggetions (thanks a lot!). New test run: 6776707a23c948d79e9df883d5c4d42f |
I simplified the two functions and renamed a few things, I added in option to get the count of missense and synonymous variants. I also prefer to write the aggregated stats in the HT globals, I think it's better structured this way. We'll discuss if only output just one final HT, but make it two steps.
I got the same number of all_variants the same as you for 1 sample, but number of variants passing filters are not again the same, I think the new Exome release seems changed, we need to confirm with Julia for that.
My test run is here: bcd45e3f472b4033bfac9532a022dafe