Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise combine scorefiles #43

Open
nebfield opened this issue Aug 5, 2024 · 0 comments
Open

Parallelise combine scorefiles #43

nebfield opened this issue Aug 5, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@nebfield
Copy link
Member

nebfield commented Aug 5, 2024

It's currently bottlenecked by creating millions of python objects (ScoreVariants, one per row). Should be pretty simple to implement. Just need to:

  • Write each normalised scoring file to a new corresponding file (currently all files are written to a single combined file)
  • Change output from csv.gz to arrow (ipc) to support lazy scanning by pgscatalog-match
  • Set up a process pool executor to distribute jobs to worker processes in the CLI
@nebfield nebfield added the enhancement New feature or request label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant