qwen35: add single-stream DFlash latency A/B gate

## Background

PR #507 adds opt-in Qwen3.5 DFlash speculative decoding and documents that the measured win is concentrated in multi-active batches (`c4/c8/c16`), while single-concurrency `c1` is flat to slightly slower.

The Qwen3 DFlash path already has a dedicated single-stream latency A/B harness (`dflash_speculative_perf.rs`): fixed 256-token budget, greedy / ignore-eos, one warmup discarded, spec OFF vs ON, and printed tok/s speedup. Qwen3.5 should have an equivalent harness so the `c1` behavior is measured directly and remains reproducible.

## Goal

Add a Qwen3.5 DFlash single-stream performance gate that reports baseline vs DFlash speed under the same fixed-token setup.

## Suggested scope

- Add a Qwen3.5-specific `dflash_speculative_perf` test or bench.
- Use fixed output budget, greedy decoding, ignore-eos behavior, and one discarded warmup.
- Print baseline tok/s, DFlash tok/s, speedup ratio, acceptance length/rate, and token sanity.
- Keep the assertion conservative: fail only on catastrophic slowdown or invalid output, while leaving the actual speedup visible in `--nocapture`.
- Document the current Qwen3.5 expectation: `c1` may be flat/slightly negative unless a later optimization recovers launch overhead.

## Validation

- `cargo test --release -p openinfer-qwen35-4b --features qwen35-4b --test dflash_speculative_perf -- --nocapture --test-threads=1`
- Same-host A/B output included in the issue or follow-up PR.

Related: #434, #507.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen35: add single-stream DFlash latency A/B gate #513

Background

Goal

Suggested scope

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

qwen35: add single-stream DFlash latency A/B gate #513

Description

Background

Goal

Suggested scope

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions