Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare scratch usage between MarkDuplicatesSpark and SAMtools markdup #278

Open
tyamaguchi-ucla opened this issue Oct 3, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@tyamaguchi-ucla
Copy link
Contributor

          @nkwang24 @yashpatel6 For multi-library samples, this approach would help although it will take some time to implement it. Also, it would be helpful to understand the usage of `/scratch` space between `MarkDuplicatesSpark` and `SAMtools markdup`.

Originally posted by @tyamaguchi-ucla in #234 (comment)

@tyamaguchi-ucla
Copy link
Contributor Author

@yashpatel6 @j2salmingo I suggest that we work on this task before implementing #234 as it's relatively straightforward and it will provide important insights on /scratch usage. Also, unrelated but we might want to create a new release soon after we update the default reference path (alt-aware reference) in the template config.

@j2salmingo
Copy link
Contributor

Just to keep everything in my head straight, let me know if there is something I am missing:

-Benchmark MarkDuplicatesSpark vs SAMTools markdup
-This can be done by isolating the process and creating one step pipelines for each one
-/scratch space can be measured by running some script that periodically runs du until the process finishes

Which BAM files should I be using for the benchmarking?
Was there a third option I forgot about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants