-
Notifications
You must be signed in to change notification settings - Fork 29
Description
WILD, WILD Feature Request: Brainstorm Mode (Batch Generation + Judge) for WritingWay
Introduce a Brainstorm Mode that automates generation across multiple LLMs and configurations, with automatic evaluation (“judge pass”) of all outputs.
This removes today’s manual copy/paste, tagging, and reformatting, and returns ranked, curated results directly to the LLM output area, ready to move into the editor.
My current workflow
- Generate multiple outputs for the same action beat, sometimes using different models.
- For each action beat response copy it into a text file and wrap with <response>...</response>.
- Add a judge prompt at the top of that file.
- Go to the Outline and copy all the context from Prompt Preview (eye icon).
- Manually feed the file into a judge model (e.g., Sonnet 4 or O3) via poe.com.
This is tedious and error‑prone, especially with 10+ responses, large files, and heavy text processing.
Idea
One‑click batch generation using multiple models, with per‑model run counts and temperatures, plus built‑in throttling/delays to avoid rate limits. Automatically collate outputs and hand them off to a judge model with a configurable prompt. Allow saving and loading reusable presets.
Proposed user workflow
- In the Action Beat view, click “Wizard” to open Brainstorm Mode.
- Add one or more Run Groups. For each Run Group:
- Choose a model (e.g., Kimi K2, Sonnet 4, R1T2).
- Set the number of runs (e.g., 5, 1, 10).
- Set the temperature.
- Set a delay between requests (seconds/minutes) to respect rate limits.
- Select a judge model and a judge prompt (from the Prompts library).
- Set retries, globally and/or per model, with backoff.
- Optionally save the setup as a preset; load or modify later.
- Add context (scenes, characters, etc)
- Run!
- Insert the judge’s response into the LLM output preview window.
There should be a Cancel button to stop the job (optionally preserving partial results).
There should be exponential backoff on 429/5xx responses with capped retries.
There should be a progress meter showing how many requests made/failed/done.
Open Questions
- Do we need per‑provider concurrency caps (e.g., max in‑flight requests) in addition to delay? Local inference can't handle more than one request at a time, openrouter.ai can.
- Should we show estimated tokens/cost per batch before running? It’s extra work, but this mode could consume a lot of money.
- If one model fails repeatedly, should we stop or skip to the next model?