Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More informative error messages #36

Open
smlmbrt opened this issue Jan 5, 2023 · 1 comment
Open

More informative error messages #36

smlmbrt opened this issue Jan 5, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@smlmbrt
Copy link
Member

smlmbrt commented Jan 5, 2023

Hard to debug certain error messages for users as the assertions don't list the problematic variants (e.g. duplicate IDs, PGScatalog/pgsc_calc#72 (comment)). Should consider adding list of scoring files/variants that are causing the breakage

Relevant code snippet:

def _check_duplicate_vars(matches: pl.LazyFrame):
max_occurrence: list[int] = (matches.filter(pl.col('match_status') == 'matched')
.groupby(['accession', 'ID'])
.agg(pl.count())
.select('count')
.max()
.collect()
.get_column('count')
.to_list())
assert max_occurrence == [1], "Duplicate IDs in final matches"

@smlmbrt smlmbrt added the enhancement New feature or request label Jan 5, 2023
@smlmbrt
Copy link
Member Author

smlmbrt commented Jan 9, 2023

I think here we need to output: ['accession', 'ID', [list of affected scoring file row_nr]] that way we can grep the relevant rows of pvar and scoring file to see what's wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant