-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is a larger change which overhauls how binsplitting is done, and, as a consequence, reworks some of the overall workflow in `__main__.py`. The PR is intended to address the following problems: * Before, we only output either the binsplit clusters, or the unsplit clusters. This is problematic, because we know the binsplit clusters are the best ones, so we would like to output these. However, the unsplit ones contain important information about the source cluster, which powerusers need to be able to recover. - Now, we output both `_split.tsv` and `_unsplit.tsv` files, if binsplitting takes place. * Before, we defaulted to no binsplitting, even as we know it was inferior - Now, `-o C` is default. * Before, if a user passed in a wrong binsplit separator, Vamb would not error until the clustering step, and the error message would be inscrutable - Now, error already when parsing the contigs, EXCEPT if the binsplit sep has defaulted to 'C', in which case binsplitting is disabled, and the user is warned - The error message is significantly improved and more explanatory * Before, the logic of where binsplitting happened was ad-hoc, and scattered all over the place. For example, binsplitting took place during cluster writing, during bin writing, during benchmarking, during clustering itself, and immediately after clustering. It was also implemented multiple places. - Now, create a `BinSplitter` class responsible for binsplitting. The writer functions and loader functions do not binsplit. - Now, binsplitting mostly takes place immediately before writing the split clusters meaning the clusters are unambiguously unsplit for the majority of the program
- Loading branch information
1 parent
ff5eac6
commit 15638ff
Showing
8 changed files
with
456 additions
and
567 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.