Bioinformatics automata toolkit for constructing, manipulating, and applying weighted finite-state transducers (WFSTs). Provides both a CLI (boss) and C++ library.
This is a reference algorithm project (see global ~/.claude/CLAUDE.md for the full standard).
Implementation tiers: JAX (reference), WebGPU (JS), C++ CLI/library (legacy, in place of Rust/WASM).
The C++ predates the standard; WASM compilation is handled separately from the C++ build.
All the standard's requirements apply: math-first design, API uniformity across tiers,
cross-implementation consistency tests, proactive edge case tests, bug regression tests
propagated to all implementations, benchmarks, and documentation.
brew install gsl pkgconfig # macOS deps (kseq.h vendored; Boost removed; argparse in-tree)
make # builds bin/boss
npm install # needed for tests
make test # runs full test suite
make clean # removes build artifactsCompiler: clang++ (preferred) or g++, C++11. Links against GSL and zlib.
src/— C++ source files (core library + headers)target/boss.cpp— main entry point for thebossCLIext/— vendored dependencies (nlohmann_json, valijson, cpp-peglib, cpp-httplib, kseq, fast5, compat)schema/— JSON Schema files for the transducer format and related data structurespreset/— preset machine JSON files (translate, dnapsw, protpsw, etc.)data/— parameter data files (codon tables, substitution matrices)constraints/— parameter constraint filesparams/— parameter filesjs/— Node.js scripts for generating preset machines and test utilitiest/— test data, expected outputs, and test source filesbin/— compiled binary output (generated)obj/— compiled object files (generated)docs/— GitHub Pages documentation (Jekyll + Markdown), served at machineboss.orgpython/machineboss/— Python/JAX package (machine, weight, eval, forward, viterbi, etc.)python/codes/— Python coding-theory utilities (hamming74.py, mixradar.py)img/— images for documentationexamples/— example scripts
The core types are in src/machine.h (Machine, MachineState, MachineTransition) and src/weight.h (WeightExpr, weight algebra). The machine JSON format uses an expression language for weights (arithmetic, log, exp, parameters).
Key modules:
machine.cpp— machine construction, composition, intersection, concatenation, union, sort, eliminateforward.cpp/backward.cpp— Forward/Backward algorithmsviterbi.cpp— Viterbi algorithmbeam.cpp— beam search encoding/decodingctc.cpp— CTC prefix search, MCMC, simulated annealingcompiler.cpp— code generation (C++/JS) for Forward algorithmfitter.cpp/counts.cpp— Baum-Welch training via GSL optimizerseval.cpp— weight expression evaluationparsers.cpp— regex, weight expression, and command-line parsinghmmer.cpp— HMMER profile HMM importcsv.cpp— CSV profile importfastseq.cpp— FASTA I/Oschema.cpp— JSON schema validation (via valijson)preset.cpp— built-in preset machines (embedded as xxd includes fromsrc/preset/)
The README help text is auto-generated from boss -h. Run make README.md to update it after changing command-line options.
Tests are defined in the Makefile and use t/testexpect.py as a harness. Test categories: schema validation, composition, construction, I/O, algebra, dynamic programming, code generation, encoding/decoding, expression parsing, JSON API operations, preset loading. Some tests require node (JS tests).
Refuse to commit if any tests are not passing.
The native machine format is a restricted JSON representation of a WFST. Schemas are in schema/. The start state is always first; the end state is always last. Transitions can use algebraic weight expressions with named parameters.
Documentation is hosted at machineboss.org via GitHub Pages. Markdown source files are in docs/:
docs/machineboss.md— program referencedocs/json-format.md— JSON format referencedocs/expressions.md— weight expression mini-language (grammar insrc/grammars/expr.h, parser insrc/parsers.cpp)docs/json-output.md— JSON output format reference (machine, parameters, loglike, alignment, counts, encode, decode)docs/composition.md— transducer composition algorithm documentationdocs/webgpu.md— WebGPU API reference