Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions to calculate variants per sample in exomes and genomes #577

Merged

Conversation

KoalaQin
Copy link
Contributor

@KoalaQin KoalaQin commented Mar 9, 2024

I simplified the two functions and renamed a few things, I added in option to get the count of missense and synonymous variants. I also prefer to write the aggregated stats in the HT globals, I think it's better structured this way. We'll discuss if only output just one final HT, but make it two steps.
I got the same number of all_variants the same as you for 1 sample, but number of variants passing filters are not again the same, I think the new Exome release seems changed, we need to confirm with Julia for that.
My test run is here: bcd45e3f472b4033bfac9532a022dafe

:param agg_rare_variants: Stratify by variants which have adj AF <0.1%.
:param suffix: String of arguments to append to name.
:param pass_filters: Filter to variants which pass all variant qc filters.
:param ukb_regions: Filter to variants in UKB regions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if all these should be under pass_filters, like:

    if pass_filters:
        arg_dict.update({"pass_filters": hl.len(vmt.filters) == 0})
        if ukb_regions:
    ....

@KoalaQin KoalaQin changed the title Per sample counts 4 1 suggestions Suggestions to calculate variants per sample in exomes and genomes Mar 10, 2024
vmt_vep = vep.filter_vep_transcript_csqs(
vmt,
if by_csqs:
ht = filter_vep_transcript_csqs(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you or Julia agree on filtering to only MANE select transcripts? I leave it the same for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that was just the default. Let me parametrize this in the coming review though

@KoalaQin
Copy link
Contributor Author

@matren395 Back to you now!

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some more work to do this evening, let me know how this looks

@KoalaQin
Copy link
Contributor Author

KoalaQin commented Mar 12, 2024

@matren395 Back to you again, I deleted two unnecessary arguments and implemented your suggetions (thanks a lot!). New test run: 6776707a23c948d79e9df883d5c4d42f

@KoalaQin KoalaQin requested a review from matren395 March 12, 2024 18:20
@KoalaQin KoalaQin merged commit fdbc7b8 into dm/per_sample_counts_4_1 Mar 12, 2024
1 check passed
@KoalaQin KoalaQin deleted the qh/per_sample_counts_4_1_suggestions branch April 2, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants