Skip to content

Explore hybrid flow: LLM synthesis + deterministic extension for remaining techniques #14

@Blevene

Description

@Blevene

Context

The LLM-synthesized flow correctly models the high-confidence causal chain (e.g., 5-step DarkCloud infection chain), but leaves 43+ remaining technique claims unsequenced. These are post-exploitation capabilities (credential theft, discovery, persistence, exfiltration) that the report describes as capabilities rather than a strict temporal narrative.

Current Behavior

  • LLM synthesizer produces a focused 5-10 step flow with causal reasoning
  • Deterministic fallback produces a full kill-chain-ordered flow but without causal reasoning
  • The two are mutually exclusive — LLM success means deterministic doesn't run

Proposed Enhancement

Explore a hybrid approach: LLM synthesis for the core narrative, then deterministic extension to model remaining techniques at lower confidence. Similar to the approach used in /tank/orkl_data_mapped.

Options to explore

  1. Two-pass flow: LLM synthesizes core chain (high confidence), then deterministic ordering appends remaining techniques (lower confidence, e.g., 0.3-0.5 probability edges)

  2. Co-occurrence overlay: Group unsequenced techniques by tactic and attach as parallel branches off the main flow (hub-and-spoke pattern already exists in _create_cooccurrence_edges)

  3. Prompt engineering: Adjust the synthesis prompt to encourage the LLM to include more steps (currently says "Choose up to {max_steps} pivotal steps" — the word "pivotal" causes conservative selection)

  4. Tiered confidence: Mark LLM-synthesized steps as confidence: high, deterministic extensions as confidence: medium, co-occurrence as confidence: low

References

  • Flow synthesis: bandjacks/llm/flow_synthesizer.py
  • Deterministic ordering: bandjacks/llm/flow_builder.py:_build_deterministic()
  • Co-occurrence edges: bandjacks/llm/flow_builder.py:_create_cooccurrence_edges()
  • Orkl reference: /tank/orkl_data_mapped

Acceptance Criteria

  • Remaining techniques appear in the flow (even at lower confidence)
  • LLM-synthesized core chain preserved at high confidence
  • Edge probabilities reflect confidence tier
  • Flow visualization distinguishes confidence levels

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions