Skip to content

Feature Request: Brainstorm Mode (Batch Generation + Judge) #97

@burbilog

Description

@burbilog

WILD, WILD Feature Request: Brainstorm Mode (Batch Generation + Judge) for WritingWay

Introduce a Brainstorm Mode that automates generation across multiple LLMs and configurations, with automatic evaluation (“judge pass”) of all outputs.

This removes today’s manual copy/paste, tagging, and reformatting, and returns ranked, curated results directly to the LLM output area, ready to move into the editor.

My current workflow

  1. Generate multiple outputs for the same action beat, sometimes using different models.
  2. For each action beat response copy it into a text file and wrap with <response>...</response>.
  3. Add a judge prompt at the top of that file.
  4. Go to the Outline and copy all the context from Prompt Preview (eye icon).
  5. Manually feed the file into a judge model (e.g., Sonnet 4 or O3) via poe.com.
    This is tedious and error‑prone, especially with 10+ responses, large files, and heavy text processing.

Idea

One‑click batch generation using multiple models, with per‑model run counts and temperatures, plus built‑in throttling/delays to avoid rate limits. Automatically collate outputs and hand them off to a judge model with a configurable prompt. Allow saving and loading reusable presets.

Proposed user workflow

  1. In the Action Beat view, click “Wizard” to open Brainstorm Mode.
  2. Add one or more Run Groups. For each Run Group:
    • Choose a model (e.g., Kimi K2, Sonnet 4, R1T2).
    • Set the number of runs (e.g., 5, 1, 10).
    • Set the temperature.
    • Set a delay between requests (seconds/minutes) to respect rate limits.
  3. Select a judge model and a judge prompt (from the Prompts library).
  4. Set retries, globally and/or per model, with backoff.
  5. Optionally save the setup as a preset; load or modify later.
  6. Add context (scenes, characters, etc)
  7. Run!
  8. Insert the judge’s response into the LLM output preview window.

There should be a Cancel button to stop the job (optionally preserving partial results).
There should be exponential backoff on 429/5xx responses with capped retries.
There should be a progress meter showing how many requests made/failed/done.

Open Questions

  • Do we need per‑provider concurrency caps (e.g., max in‑flight requests) in addition to delay? Local inference can't handle more than one request at a time, openrouter.ai can.
  • Should we show estimated tokens/cost per batch before running? It’s extra work, but this mode could consume a lot of money.
  • If one model fails repeatedly, should we stop or skip to the next model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions