Skip to content

Milestones

List view

  • Post-0.1.0 arc: large-scale MoE serving (kimi prefill/TTFT, a2a overlap, DP routing, P/D via pegaflow) + deferred qwen3 surfaces (YaRN, speculative decoding, batched random sampling).

    No due date
    2/6 issues closed
  • 0.1.0 is the first usable openinfer release: a Rust + CUDA serving engine with an OpenAI-compatible API, Qwen-family coverage, documented benchmarks, and clear project packaging. Scope is stabilization and presentation, not broad model support.

    No due date
    14/16 issues closed