Skip to content

Missing cause/risk analysis for datasets 11, 12, 13, 20 in risk_dataset_v2.json #19

@harpomaxx

Description

@harpomaxx

Problem

datasets/risk_dataset_v2.json contains 985 incidents, of which 159 are missing all cause/risk analysis fields (cause_risk_gpt_4o, cause_risk_gpt_4o_mini, cause_risk_qwen2_5, cause_risk_qwen2_5:3b).

These incidents have dag_analysis present but no model outputs, which causes the LLM-as-judge evaluator to skip them with a "no analyses supplied" message.

Affected datasets

Dataset Missing incidents
my_dataset_11 51
my_dataset_12 49
my_dataset_13 54
my_dataset_20 5
Total 159

Root cause

generate_cause_risk_analysis.sh was not run for datasets 11, 12, 13, and partially 20 before the final merge into risk_dataset_v2.json.

Fix

Re-run the cause/risk generation step for the affected datasets, then rebuild risk_dataset_v2.json using build_risk_dataset_v2.sh:

./generate_cause_risk_analysis.sh datasets/my_dataset_11.jsonl --model gpt-4o-mini --group-events
./generate_cause_risk_analysis.sh datasets/my_dataset_11.jsonl --model gpt-4o --group-events
./generate_cause_risk_analysis.sh datasets/my_dataset_11.jsonl --model qwen2.5:3b --base-url http://localhost:11434/v1 --group-events
# repeat for my_dataset_12, my_dataset_13, my_dataset_20

./build_risk_dataset_v2.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions