Context
The LLM-synthesized flow correctly models the high-confidence causal chain (e.g., 5-step DarkCloud infection chain), but leaves 43+ remaining technique claims unsequenced. These are post-exploitation capabilities (credential theft, discovery, persistence, exfiltration) that the report describes as capabilities rather than a strict temporal narrative.
Current Behavior
- LLM synthesizer produces a focused 5-10 step flow with causal reasoning
- Deterministic fallback produces a full kill-chain-ordered flow but without causal reasoning
- The two are mutually exclusive — LLM success means deterministic doesn't run
Proposed Enhancement
Explore a hybrid approach: LLM synthesis for the core narrative, then deterministic extension to model remaining techniques at lower confidence. Similar to the approach used in /tank/orkl_data_mapped.
Options to explore
-
Two-pass flow: LLM synthesizes core chain (high confidence), then deterministic ordering appends remaining techniques (lower confidence, e.g., 0.3-0.5 probability edges)
-
Co-occurrence overlay: Group unsequenced techniques by tactic and attach as parallel branches off the main flow (hub-and-spoke pattern already exists in _create_cooccurrence_edges)
-
Prompt engineering: Adjust the synthesis prompt to encourage the LLM to include more steps (currently says "Choose up to {max_steps} pivotal steps" — the word "pivotal" causes conservative selection)
-
Tiered confidence: Mark LLM-synthesized steps as confidence: high, deterministic extensions as confidence: medium, co-occurrence as confidence: low
References
- Flow synthesis:
bandjacks/llm/flow_synthesizer.py
- Deterministic ordering:
bandjacks/llm/flow_builder.py:_build_deterministic()
- Co-occurrence edges:
bandjacks/llm/flow_builder.py:_create_cooccurrence_edges()
- Orkl reference:
/tank/orkl_data_mapped
Acceptance Criteria
Context
The LLM-synthesized flow correctly models the high-confidence causal chain (e.g., 5-step DarkCloud infection chain), but leaves 43+ remaining technique claims unsequenced. These are post-exploitation capabilities (credential theft, discovery, persistence, exfiltration) that the report describes as capabilities rather than a strict temporal narrative.
Current Behavior
Proposed Enhancement
Explore a hybrid approach: LLM synthesis for the core narrative, then deterministic extension to model remaining techniques at lower confidence. Similar to the approach used in
/tank/orkl_data_mapped.Options to explore
Two-pass flow: LLM synthesizes core chain (high confidence), then deterministic ordering appends remaining techniques (lower confidence, e.g., 0.3-0.5 probability edges)
Co-occurrence overlay: Group unsequenced techniques by tactic and attach as parallel branches off the main flow (hub-and-spoke pattern already exists in
_create_cooccurrence_edges)Prompt engineering: Adjust the synthesis prompt to encourage the LLM to include more steps (currently says "Choose up to {max_steps} pivotal steps" — the word "pivotal" causes conservative selection)
Tiered confidence: Mark LLM-synthesized steps as
confidence: high, deterministic extensions asconfidence: medium, co-occurrence asconfidence: lowReferences
bandjacks/llm/flow_synthesizer.pybandjacks/llm/flow_builder.py:_build_deterministic()bandjacks/llm/flow_builder.py:_create_cooccurrence_edges()/tank/orkl_data_mappedAcceptance Criteria