qwen35: run DFlash HTTP serving pressure sweep

## Background

PR #507 includes same-host in-process benchmark evidence for Qwen3.5 DFlash and keeps the claim boundary explicit: it is not an HTTP serving pressure claim.

Qwen3 DFlash has serving-level measurement scripts under `tools/bench/`, including `run_serving_bench.sh` and `qps_sweep.sh`. Qwen3.5 should get the same public evidence layer before we make broader OpenAI-compatible serving claims.

## Goal

Run and document a Qwen3.5 DFlash HTTP serving A/B sweep using the existing serving benchmark scripts.

## Suggested scope

- Launch Qwen3.5 baseline and Qwen3.5 + DFlash through the OpenAI-compatible server path.
- Use greedy requests with `temperature=0`, `ignore_eos`, and percentile metrics for `ttft,tpot,itl,e2el`.
- Cover at least `prompt_len=1/1024/4096`, `output_len=256`, and concurrency `1/4/8/16`.
- Record completed/failed requests, TTFT, TPOT, ITL p50/p99, E2EL, output tok/s, acceptance length/rate, and token sanity.
- Keep the result separate from in-process benchmark evidence.

## Validation

- Public benchmark table with commit, GPU model, CUDA/driver versions, workload shape, and pass/fail counts.
- No private hostnames, credentials, or local artifact paths in docs or comments.

Related: #434, #507.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen35: run DFlash HTTP serving pressure sweep #514

Background

Goal

Suggested scope

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

qwen35: run DFlash HTTP serving pressure sweep #514

Description

Background

Goal

Suggested scope

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions