Skip to content

feat: MiroFish Phase 1+2 improvements - sim quality, ops, report quality#1

Open
lilubot wants to merge 5 commits intomainfrom
atlas-hardening
Open

feat: MiroFish Phase 1+2 improvements - sim quality, ops, report quality#1
lilubot wants to merge 5 commits intomainfrom
atlas-hardening

Conversation

@lilubot
Copy link
Copy Markdown
Owner

@lilubot lilubot commented Mar 20, 2026

Summary

Four commits of improvements across simulation quality, ops reliability, and report quality. All changes are additive - zero existing functionality broken.


?? Perseus - Phase 1 Sim Quality (commit ade8ab8)

  • NER_MODEL config option (default: minimax-m2.7:cloud)
  • _deduplicate_entities() in entity_reader.py - fuzzy dedup via SequenceMatcher (threshold 0.85)
  • PlatformSimulator ABC + TwitterSimulator + RedditSimulator (platform_simulators.py)
  • SimulationOutcome dataclass + OutcomeScorer class (outcome_scorer.py)

?? Prometheus - Phase 1 Ops & Reliability (commit b509ecb)

  • /api/health - Neo4j + Ollama + uploads checks, healthy/degraded/unhealthy
  • /api/health/ready - strict readiness gate for Prometheus before starting sims
  • preflight.py - 5 startup checks, logs warnings, never blocks startup
  • /api/ops/simulation/<id>/poll - machine-readable: terminal, success, stalled, percent_complete

?? Calypso - Phase 1 Report Quality (commit becc6e5)

  • English-only enforcement in PLAN_SYSTEM_PROMPT + SECTION_SYSTEM_PROMPT_TEMPLATE
  • _validate_section_output() - strips corruption, warns on high non-ASCII ratio
  • _run_report_sanity_check() - post-generation integrity check
  • executive_summary.py - ExecutiveSummary dataclass + LLM extraction ? JSON + Markdown card

?? Perseus - Phase 2 Sim Quality (commit 345acae)

  • Behavioral variance fields on OasisAgentProfile: conformity, reactiveness, persuadability, behavioral_noise
  • FatigueState - per-agent fatigue tracking with decay
  • ArchetypePersonaGenerator - 5 archetypes with Gaussian jitter + LLM bio generation
  • NarrativeFrame + MultiFrameInitializer - competing narrative frames at t=0 with agent distribution

lilubot added 2 commits March 19, 2026 16:12
…ators, outcome scorer

- Add NER_MODEL config option to Config (default: minimax-m2.7:cloud)
- Add _deduplicate_entities() to EntityReader using SequenceMatcher (threshold 0.85)
- Add PlatformSimulator base class + TwitterSimulator + RedditSimulator (platform_simulators.py)
- Add SimulationOutcome dataclass + OutcomeScorer class (outcome_scorer.py)

All changes are additive only. No existing functionality modified.
Implements Phase 1 of MIROFISH_MEMO_SIM_QUALITY.md roadmap.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Adds health/ops endpoints and startup preflight checks; introduces entity deduplication, platform simulators, outcome scoring, executive-summary and persona generators; changes report endpoint to require Neo4j extension and to pass GraphToolsService into ReportAgent; removes Ollama from docker-compose; adds NER model config.

Changes

Cohort / File(s) Summary
App entry, preflight & routes
backend/app/__init__.py, backend/app/preflight.py, backend/app/routes/__init__.py, backend/app/routes/health.py, backend/app/routes/ops.py
Register new health and ops blueprints; run and store startup preflight checks; add /api/health and /api/health/ready and /api/ops/simulation/<id>/poll handlers with component probes, readiness/stall logic, and preflight aggregation.
Report API & ReportAgent changes
backend/app/api/report.py, backend/app/services/report_agent.py
generate_report() now obtains neo4j_storage from current_app.extensions, returns 500 if missing, and constructs GraphToolsService(storage=storage) passed into ReportAgent via graph_tools. ReportAgent adds per-section and full-report sanitization and sanity checks.
Configuration & exports
backend/app/config.py, backend/app/services/__init__.py
Added Config.NER_MODEL env-backed setting. Export list expanded to include new simulation/outcome and platform simulator symbols.
Entity reading & deduplication
backend/app/services/entity_reader.py
EntityReader.__init__ signature extended with ner_model and deduplication_threshold. Added deduplication pass using SequenceMatcher, canonical merging of labels/summaries/attributes, and unique merging of related nodes/edges.
Platform simulators
backend/app/services/platform_simulators.py
New PlatformSimulator base class and concrete TwitterSimulator and RedditSimulator with visibility scoring and engagement-multiplier formulas.
Outcome scoring
backend/app/services/outcome_scorer.py
New SimulationOutcome dataclass and OutcomeScorer with score_simulation() computing totals, engagement, reach, viral counts, sentiment SD, platform distribution, top influencers, and peak activity round.
Executive summary generator
backend/app/services/executive_summary.py
New ExecutiveSummary dataclass and ExecutiveSummaryGenerator that calls LLM (chat_json/chat fallback), normalizes fields, and writes JSON/MD artifacts.
Persona & fatigue tooling
backend/app/services/oasis_profile_generator.py
Extended OasisAgentProfile with phase‑2 behavioral fields; added FatigueState and ArchetypePersonaGenerator to synthesize personas with behavioral variance.
Simulation framing
backend/app/services/simulation_config_generator.py
Added NarrativeFrame dataclass and MultiFrameInitializer to generate/distribute competing narrative frames (LLM or rule-based) and assign agents to frames.
Docker Compose cleanup
docker-compose.yml
Removed ollama service and ollama_data volume; dropped mirofish dependency on Ollama; kept Neo4j service/volumes and adjusted commented lines.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through code with nimble paws,
I stitch the graphs and check the laws,
Deduped names and platforms sing,
Reports now call the Neo4j ring,
Ollama left — but oh, what paws! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes across three phases (sim quality, ops, report quality), aligning with the detailed PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch atlas-hardening
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks Phase 1 of the simulation quality improvements, focusing on enhancing the realism and analytical capabilities of the MiroFish simulation. It introduces a modular system for platform-specific content behavior and comprehensive outcome scoring, alongside refining entity processing to improve data quality. These changes are designed to be additive and pluggable, setting the foundation for future simulation enhancements without altering existing core logic.

Highlights

  • Platform-Specific Simulation: Introduced an abstract base class PlatformSimulator with concrete implementations for Twitter and Reddit, enabling platform-specific post visibility and engagement scoring based on distinct platform behaviors.
  • Simulation Outcome Scoring: Added OutcomeScorer to compute key simulation metrics such as total posts, engagement, average reach, viral posts, sentiment deviation, platform distribution, and top influencers, providing a quantitative overview of simulation runs.
  • Entity Deduplication: Implemented fuzzy entity deduplication in EntityReader using SequenceMatcher to consolidate similar entities, improving data quality and consistency.
  • Configuration Update: Configured a new NER_MODEL option in the application settings to specify the Named Entity Recognition model.
  • Infrastructure Streamlining: Removed the Ollama service and its related configurations from docker-compose.yml, streamlining the local development environment.
  • Report Agent Enhancement: Integrated GraphToolsService into the ReportAgent for enhanced graph interaction during report generation, ensuring proper context for background tasks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @lilubot, you have reached your weekly rate limit of 1500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@gemini-code-assist
Copy link
Copy Markdown

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docker-compose.yml (1)

2-15: ⚠️ Potential issue | 🟠 Major

Activate Docker-mode LLM/embedding endpoints in docker-compose.yml.

The mirofish service defaults to localhost:11434 for both LLM and embedding endpoints (from backend/app/config.py), but inside a container, localhost resolves to the container itself, not the host. After removing the in-compose ollama service, these defaults will fail unless explicitly overridden.

The project already documents Docker overrides in .env.example (commented out with the note "Uncomment these when running ALL services inside Docker"):

LLM_BASE_URL=http://ollama:11434/v1
EMBEDDING_BASE_URL=http://ollama:11434

However, since the ollama service no longer exists in the compose stack, activate these overrides using extra_hosts to map the container hostname, or update .env.example to reflect the new offline-only setup.

Suggested wiring for host.docker.internal
   mirofish:
     build: .
     container_name: mirofish-offline
     env_file:
       - .env
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    environment:
+      LLM_BASE_URL: ${LLM_BASE_URL:-http://host.docker.internal:11434/v1}
+      EMBEDDING_BASE_URL: ${EMBEDDING_BASE_URL:-http://host.docker.internal:11434}
     ports:
       - "3000:3000"
       - "5001:5001"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 2 - 15, The mirofish service is using
host-local LLM/embedding defaults that won't resolve inside the container; add
an extra_hosts mapping under the service to point the ollama hostname to the
Docker host (e.g., add extra_hosts: - "ollama:host-gateway" to the mirofish
service) and ensure the .env or .env.example contains the override variables
LLM_BASE_URL=http://ollama:11434/v1 and EMBEDDING_BASE_URL=http://ollama:11434
so the app (config.py) uses those container-resolvable endpoints.
🧹 Nitpick comments (1)
backend/app/services/outcome_scorer.py (1)

129-134: Stabilize tied influencer rankings.

Equal engagement scores currently inherit input order, so the same simulation can emit a different top_influencers list when records arrive in a different order. A secondary name sort keeps reports reproducible.

♻️ Proposed refactor
         ranked = sorted(
             influence_scores.items(),
-            key=lambda item: item[1],
-            reverse=True,
+            key=lambda item: (-item[1], item[0].lower()),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/outcome_scorer.py` around lines 129 - 134, The current
sort of influence_scores in outcome_scorer.py only keys by score so ties
preserve unpredictable input order; update the sorted call that produces ranked
(the block using influence_scores.items() and influencer_limit) to use a
deterministic secondary key of the influencer name (e.g., sort by score desc
then name asc) so tied engagement scores are reproducibly ordered in
top_influencers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/entity_reader.py`:
- Around line 79-87: Validate deduplication_threshold in EntityReader.__init__:
check that the provided deduplication_threshold is between 0.0 and 1.0
(inclusive or exclusive per project convention) and raise a ValueError with a
clear message if out of range; update the __init__ signature in entity_reader.py
to perform this early validation (before assigning to
self.deduplication_threshold) so invalid values cannot silently alter dedup
behavior.
- Around line 408-410: In _merge_entity_into_canonical, avoid sorting merged
labels because get_entity_type is order-dependent; instead preserve
canonical.labels order and append any labels from duplicate.labels that are not
already present (dedupe while keeping canonical order), so update
canonical.labels to canonical.labels + [l for l in duplicate.labels if l not in
canonical.labels]; this ensures get_entity_type and downstream
entity_types_found logic remain stable and later same-type merges still match
the canonical record.
- Around line 417-421: The merge loop treats falsy canonical values as missing;
update the condition in the block that assigns merged_attributes from
duplicate.attributes (where canonical and duplicate are used) to only treat
truly absent/undefined values as missing (e.g., check for key not in
merged_attributes or merged_attributes[key] is None) instead of using a
truthiness test like not merged_attributes[key], so valid values like 0 or False
are preserved.

In `@backend/app/services/outcome_scorer.py`:
- Around line 62-71: The engagement sum in outcome_scorer.py currently adds both
alias fields (e.g., "retweets" and "reposts", "comments" and "replies") causing
double-counting; update the aggregation logic around components/_value so
aliases are deduplicated by mapping aliases to a canonical metric (or taking the
max of alias pairs) before summing—e.g., treat "retweets"/"reposts" as one
metric and "comments"/"replies" as one metric, compute a single value for each
canonical interaction, then return the int(sum(...)) of those canonical values.

---

Outside diff comments:
In `@docker-compose.yml`:
- Around line 2-15: The mirofish service is using host-local LLM/embedding
defaults that won't resolve inside the container; add an extra_hosts mapping
under the service to point the ollama hostname to the Docker host (e.g., add
extra_hosts: - "ollama:host-gateway" to the mirofish service) and ensure the
.env or .env.example contains the override variables
LLM_BASE_URL=http://ollama:11434/v1 and EMBEDDING_BASE_URL=http://ollama:11434
so the app (config.py) uses those container-resolvable endpoints.

---

Nitpick comments:
In `@backend/app/services/outcome_scorer.py`:
- Around line 129-134: The current sort of influence_scores in outcome_scorer.py
only keys by score so ties preserve unpredictable input order; update the sorted
call that produces ranked (the block using influence_scores.items() and
influencer_limit) to use a deterministic secondary key of the influencer name
(e.g., sort by score desc then name asc) so tied engagement scores are
reproducibly ordered in top_influencers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54cfec22-5bf5-4934-ae58-2923d075ad18

📥 Commits

Reviewing files that changed from the base of the PR and between f47fa5c and ade8ab8.

📒 Files selected for processing (7)
  • backend/app/api/report.py
  • backend/app/config.py
  • backend/app/services/__init__.py
  • backend/app/services/entity_reader.py
  • backend/app/services/outcome_scorer.py
  • backend/app/services/platform_simulators.py
  • docker-compose.yml

Comment on lines +79 to +87
def __init__(
self,
storage: GraphStorage,
ner_model: Optional[str] = None,
deduplication_threshold: float = 0.85,
):
self.storage = storage
self.ner_model = ner_model or Config.NER_MODEL
self.deduplication_threshold = deduplication_threshold
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject out-of-range similarity thresholds early.

An invalid deduplication_threshold silently changes dedup behavior: values above 1 prevent merges, and values below 0 merge every same-type pair.

💡 Suggested fix
     def __init__(
         self,
         storage: GraphStorage,
         ner_model: Optional[str] = None,
         deduplication_threshold: float = 0.85,
     ):
         self.storage = storage
         self.ner_model = ner_model or Config.NER_MODEL
+        if not 0.0 <= deduplication_threshold <= 1.0:
+            raise ValueError("deduplication_threshold must be between 0.0 and 1.0")
         self.deduplication_threshold = deduplication_threshold
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/entity_reader.py` around lines 79 - 87, Validate
deduplication_threshold in EntityReader.__init__: check that the provided
deduplication_threshold is between 0.0 and 1.0 (inclusive or exclusive per
project convention) and raise a ValueError with a clear message if out of range;
update the __init__ signature in entity_reader.py to perform this early
validation (before assigning to self.deduplication_threshold) so invalid values
cannot silently alter dedup behavior.

Comment on lines +408 to +410
def _merge_entity_into_canonical(self, canonical: EntityNode, duplicate: EntityNode) -> None:
"""Merge duplicate entity details into the canonical entity in place."""
canonical.labels = sorted(set(canonical.labels) | set(duplicate.labels))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the canonical label order here.

get_entity_type() is order-dependent, so sorting the merged labels can change an entity from its original filtered type to some other label mid-dedup. That leaks into the recomputed entity_types_found on Lines 247-250 and can also stop later same-type matches from merging into the same canonical record.

💡 Suggested fix
-        canonical.labels = sorted(set(canonical.labels) | set(duplicate.labels))
+        canonical.labels = list(dict.fromkeys(canonical.labels + duplicate.labels))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _merge_entity_into_canonical(self, canonical: EntityNode, duplicate: EntityNode) -> None:
"""Merge duplicate entity details into the canonical entity in place."""
canonical.labels = sorted(set(canonical.labels) | set(duplicate.labels))
def _merge_entity_into_canonical(self, canonical: EntityNode, duplicate: EntityNode) -> None:
"""Merge duplicate entity details into the canonical entity in place."""
canonical.labels = list(dict.fromkeys(canonical.labels + duplicate.labels))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/entity_reader.py` around lines 408 - 410, In
_merge_entity_into_canonical, avoid sorting merged labels because
get_entity_type is order-dependent; instead preserve canonical.labels order and
append any labels from duplicate.labels that are not already present (dedupe
while keeping canonical order), so update canonical.labels to canonical.labels +
[l for l in duplicate.labels if l not in canonical.labels]; this ensures
get_entity_type and downstream entity_types_found logic remain stable and later
same-type merges still match the canonical record.

Comment on lines +417 to +421
merged_attributes = dict(canonical.attributes)
for key, value in duplicate.attributes.items():
if key not in merged_attributes or not merged_attributes[key]:
merged_attributes[key] = value
canonical.attributes = merged_attributes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't treat falsy attribute values as missing.

not merged_attributes[key] overwrites valid canonical values like 0 and False during deduplication, which silently corrupts merged facts.

💡 Suggested fix
         merged_attributes = dict(canonical.attributes)
         for key, value in duplicate.attributes.items():
-            if key not in merged_attributes or not merged_attributes[key]:
+            current_value = merged_attributes.get(key)
+            if key not in merged_attributes or current_value is None or current_value == "":
                 merged_attributes[key] = value
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/entity_reader.py` around lines 417 - 421, The merge loop
treats falsy canonical values as missing; update the condition in the block that
assigns merged_attributes from duplicate.attributes (where canonical and
duplicate are used) to only treat truly absent/undefined values as missing
(e.g., check for key not in merged_attributes or merged_attributes[key] is None)
instead of using a truthiness test like not merged_attributes[key], so valid
values like 0 or False are preserved.

Comment on lines +62 to +71
components = (
self._value(post, "likes", 0),
self._value(post, "upvotes", 0),
self._value(post, "comments", 0),
self._value(post, "replies", 0),
self._value(post, "shares", 0),
self._value(post, "retweets", 0),
self._value(post, "reposts", 0),
)
return int(sum(int(component or 0) for component in components))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid double-counting alias metrics in engagement totals.

backend/app/services/platform_simulators.py lines 47-48 and 77-80 treat retweets/reposts and comments/replies as alternate names for the same interaction. Summing both here will inflate total_engagement, viral_posts_count, average_post_reach when it falls back to engagement, and top_influencers as soon as a post carries both aliases.

🛠️ Proposed fix
     def _engagement_value(self, post: Any) -> int:
         direct_value = self._value(post, "engagement", None)
         if direct_value is not None:
             return int(direct_value)

+        comment_count = max(
+            int(self._value(post, "comments", 0) or 0),
+            int(self._value(post, "replies", 0) or 0),
+        )
+        repost_count = max(
+            int(self._value(post, "retweets", 0) or 0),
+            int(self._value(post, "reposts", 0) or 0),
+        )
+
         components = (
             self._value(post, "likes", 0),
             self._value(post, "upvotes", 0),
-            self._value(post, "comments", 0),
-            self._value(post, "replies", 0),
+            comment_count,
             self._value(post, "shares", 0),
-            self._value(post, "retweets", 0),
-            self._value(post, "reposts", 0),
+            repost_count,
         )
         return int(sum(int(component or 0) for component in components))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
components = (
self._value(post, "likes", 0),
self._value(post, "upvotes", 0),
self._value(post, "comments", 0),
self._value(post, "replies", 0),
self._value(post, "shares", 0),
self._value(post, "retweets", 0),
self._value(post, "reposts", 0),
)
return int(sum(int(component or 0) for component in components))
comment_count = max(
int(self._value(post, "comments", 0) or 0),
int(self._value(post, "replies", 0) or 0),
)
repost_count = max(
int(self._value(post, "retweets", 0) or 0),
int(self._value(post, "reposts", 0) or 0),
)
components = (
self._value(post, "likes", 0),
self._value(post, "upvotes", 0),
comment_count,
self._value(post, "shares", 0),
repost_count,
)
return int(sum(int(component or 0) for component in components))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/outcome_scorer.py` around lines 62 - 71, The engagement
sum in outcome_scorer.py currently adds both alias fields (e.g., "retweets" and
"reposts", "comments" and "replies") causing double-counting; update the
aggregation logic around components/_value so aliases are deduplicated by
mapping aliases to a canonical metric (or taking the max of alias pairs) before
summing—e.g., treat "retweets"/"reposts" as one metric and "comments"/"replies"
as one metric, compute a single value for each canonical interaction, then
return the int(sum(...)) of those canonical values.

lilubot added 3 commits March 20, 2026 06:01
- backend/app/routes/health.py: /api/health (Neo4j + Ollama + uploads
  component check, healthy/degraded/unhealthy), /api/health/ready
  (strict readiness gate for Prometheus pre-simulation checks)
- backend/app/preflight.py: run_preflight_checks() — checks Neo4j,
  Ollama, required model, uploads dir, env vars on startup; logs
  warnings but never blocks startup
- backend/app/routes/ops.py: /api/ops/simulation/<id>/poll — returns
  terminal, success, status, percent_complete, stalled, stalled_since
  fields for machine-readable Prometheus polling
- backend/app/routes/__init__.py: package init exporting health_bp, ops_bp
- backend/app/__init__.py: register health_bp and ops_bp; call
  run_preflight_checks() at startup and store result in
  app.extensions['preflight'] for /api/health to surface

Additive only — no existing endpoints modified.
…lidation, executive summary

Changes to backend/app/services/report_agent.py:
- Confirmed all open() file writes already use encoding='utf-8'
- Added CRITICAL LANGUAGE REQUIREMENT to PLAN_SYSTEM_PROMPT (English-only output)
- Added CRITICAL LANGUAGE REQUIREMENT to SECTION_SYSTEM_PROMPT_TEMPLATE (English-only output)
- Added ReportAgent._validate_section_output(): strips null bytes, removes
  control chars, normalizes line endings, warns if non-ASCII ratio > 20%
- Added ReportAgent._run_report_sanity_check(): checks section min length,
  unicode replacement chars, and non-ASCII ratio; returns issues dict
- Wired _validate_section_output() into generate_report() after each section
- Wired _run_report_sanity_check() into generate_report() after assembly

New file backend/app/services/executive_summary.py:
- ExecutiveSummary dataclass with all structured fields (outcome, actors,
  metrics, timeline, platform sentiment, recommendations)
- ExecutiveSummaryGenerator class: calls LLM to extract structured JSON from
  full report, saves executive_summary.json and executive_summary.md
- English-only system prompt for LLM extraction
- Graceful fallback on malformed LLM responses (safe_float, safe_list, etc.)
- _render_markdown() produces clean Markdown card with emoji risk indicators

Additive changes only — no existing logic modified.
… MultiFrameInitializer

oasis_profile_generator.py:
- Add behavioral variance fields to OasisAgentProfile (conformity,
  reactiveness, persuadability, behavioral_noise)
- Add FatigueState dataclass with get_current_activity_level(),
  record_activity(), and record_rest() methods
- Add ArchetypePersonaGenerator class: generates synthetic OASIS profiles
  from five seed archetypes (Enthusiast, Skeptic, Neutral, Influencer,
  Contrarian) with per-field Gaussian variance; supports both LLM-assisted
  and rule-based identity generation

simulation_config_generator.py:
- Add NarrativeFrame dataclass: stores frame_id, label, description,
  initial_sentiment, talking_points, and assigned agent_ids
- Add MultiFrameInitializer class: generates N competing narrative frames
  at t=0 via LLM (with rule-based fallback), and distributes agents to
  frames by minimum-distance matching on sentiment_bias

All changes are strictly additive; no existing code was modified.
@lilubot lilubot changed the title feat(sim-quality): Phase 1 simulation quality improvements feat: MiroFish Phase 1+2 improvements - sim quality, ops, report quality Mar 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
backend/app/services/oasis_profile_generator.py (2)

1525-1556: Remove unnecessary f prefix from string literals.

Several strings in bio_templates don't contain placeholders but use f-string syntax:

  • Line 1527: "Skeptic": f"I question everything..."
  • Line 1528: "Neutral": f"Just here to read..."
  • Line 1530: "Contrarian": f"If everyone agrees..."
♻️ Suggested fix
         bio_templates = {
             "Enthusiast": f"Passionate about innovation and progress. Always excited to discuss new ideas. #{archetype}",
-            "Skeptic": f"I question everything. Critical thinker. Not buying the hype.",
-            "Neutral": f"Just here to read and stay informed. Opinions are my own.",
+            "Skeptic": "I question everything. Critical thinker. Not buying the hype.",
+            "Neutral": "Just here to read and stay informed. Opinions are my own.",
             "Influencer": f"Content creator | thought leader | {random.randint(1, 50)}K followers. Let's connect!",
-            "Contrarian": f"If everyone agrees, someone is wrong. Usually them.",
+            "Contrarian": "If everyone agrees, someone is wrong. Usually them.",
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/oasis_profile_generator.py` around lines 1525 - 1556,
Three entries in bio_templates ("Skeptic", "Neutral", "Contrarian") use
f-strings without any interpolation; remove the unnecessary f prefix from those
string literals so they become normal strings (leave the f prefix on entries
that do interpolate like "Enthusiast" and "Influencer"); update the
bio_templates definition accordingly and run tests to ensure no behavioral
change.

1332-1332: Use tuple instead of list for class-level constant.

DEFAULT_ARCHETYPES is a class attribute that should be immutable. Using list() creates a mutable object that could be accidentally modified, affecting all instances.

♻️ Suggested fix
-    DEFAULT_ARCHETYPES = list(_ARCHETYPE_DEFAULTS.keys())
+    DEFAULT_ARCHETYPES = tuple(_ARCHETYPE_DEFAULTS.keys())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/oasis_profile_generator.py` at line 1332, Replace the
mutable class-level list DEFAULT_ARCHETYPES with an immutable tuple to avoid
accidental mutation: change the assignment that currently uses
list(_ARCHETYPE_DEFAULTS.keys()) to use tuple(_ARCHETYPE_DEFAULTS.keys()) (or
tuple(_ARCHETYPE_DEFAULTS)) in the oasis_profile_generator module so
DEFAULT_ARCHETYPES is immutable at class level and cannot be modified by
instances; update any code that relies on list-specific methods to convert to
list locally if needed.
backend/app/services/simulation_config_generator.py (1)

1066-1072: Either implement the documented neutral split or trim the docstring.

Lines 1068-1070 promise neutral agents are spread across middle frames, but Lines 1303-1306 always pick a single closest frame. That makes the API description misleading and will concentrate neutral agents more than callers expect.

Also applies to: 1294-1307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/simulation_config_generator.py` around lines 1066 -
1072, The docstring promises neutral agents are split across middle frames but
the current selection logic always assigns each agent to the single closest
frame; update the assignment logic (where agent sentiment_bias is mapped to a
frame and frame.agent_ids are populated) so neutral agents (bias near 0) are
distributed evenly across the middle frames instead of always choosing the
single closest frame—identify the code that finds the "closest frame" and
replace it with a neutral-handling branch that computes the middle-frame subset
(e.g., the central one or two frames of the frames list sorted by positivity)
and round-robin or evenly assigns neutral agent ids into those
NarrativeFrame.agent_ids; alternatively, if you prefer the simpler fix, edit the
docstring to remove the claim about neutral splitting and state that agents are
assigned to the single closest frame based on sentiment_bias.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/preflight.py`:
- Around line 16-22: The preflight currently treats config defaults as "present"
so update the check to read from the actual environment: change logic that
builds env_vars to inspect os.environ (e.g., use k in os.environ or
os.environ.get(k)) instead of importing values from backend.app.config.Config;
add 'LLM_API_KEY' to _REQUIRED_ENV_VARS so it is validated too; apply the same
fix to the later duplicate check around lines 91-102 (ensure both places
reference os.environ directly and not Config) and keep using the same unique
symbols _REQUIRED_ENV_VARS and env_vars so the preflight reports true missing
env vars.

In `@backend/app/routes/health.py`:
- Around line 56-68: _check_uploads_dir currently uses a fixed ".health_check"
filename which causes races on os.remove; change the probe to create a unique
temp file inside the uploads folder (use tempfile.NamedTemporaryFile or
tempfile.mkstemp with dir=folder) write/read as needed, close and unlink that
unique path in a finally block to ensure cleanup, and keep returning True/False
on success/exception; also add import tempfile at the top of the file.
- Around line 119-128: The route currently attaches the raw preflight payload
from current_app.extensions.get('preflight') to the health response (variable
preflight and body) which leaks internal paths/URLs/model lists; instead,
replace that with a sanitized summary or gate detailed data behind authenticated
ops access: when preflight is present set body["preflight"] to a coarse object
(e.g. {"ok": bool(preflight.passed or all checks), "checks": {name: bool}}) or
simply {"ok": True/False}, and only include the full preflight payload if the
requester is authenticated/authorized for ops (check your auth helper in the
health route) — update the logic around current_app.extensions.get('preflight')
and the body assignment to implement this sanitization/gating.

In `@backend/app/services/oasis_profile_generator.py`:
- Around line 1233-1244: record_rest(double-counts fatigue decay) — modify
record_rest in oasis_profile_generator.py to track the last round when decay was
applied (e.g., add a new attribute like last_decay_round or last_rest_round
initialized in the profile constructor) and compute rounds_idle using
current_round minus that last_decay_round (or the later of last_activity_round
and last_decay_round) so repeated calls only apply decay for new rounds; after
applying decay set last_decay_round = current_round (and also ensure
record_activity sets last_decay_round = last_activity_round when activity
happens).

In `@backend/app/services/report_agent.py`:
- Around line 622-626: The top-level English-only enforcement conflicts with a
later instruction inside SECTION_SYSTEM_PROMPT_TEMPLATE in report_agent.py that
tells the model to switch to Chinese for Chinese source material; remove the
stale Chinese-output rule from SECTION_SYSTEM_PROMPT_TEMPLATE (or make it
conditional on a dedicated locale/config flag) so the section prompt no longer
overrides the global English-only requirement and the model output remains
consistent with the earlier enforcement.

In `@backend/app/services/simulation_config_generator.py`:
- Around line 1154-1167: The generation paths currently can return fewer frames
than self.num_frames (LLM path accepts any non-empty frames_data and the
rule-based path is limited by template_pool); update both generation branches so
the final list of NarrativeFrame instances has exactly self.num_frames: in the
LLM path (where frames_data is consumed into frames) validate len(frames) ==
self.num_frames and raise a ValueError if fewer, and in the rule-based fallback
(where template_pool is used) pad or repeat templates deterministically to
produce exactly self.num_frames (ensuring unique frame_id/label generation)
before returning; reference the NarrativeFrame construction loop (frames_data ->
NarrativeFrame) and the template_pool usage to locate where to apply
truncation/padding or the strict validation.
- Around line 1081-1094: The mapping uses frame.frame_id as a key but frame_id
can be blank or duplicated from the LLM; ensure frame_id is normalized and
unique before using it as a dictionary key (either during NarrativeFrame
construction or immediately before building frame_assignments). Update the code
that creates NarrativeFrame instances (and any place that sets frame.frame_id)
to trim/normalize the id, generate a fallback unique id when blank, and
de-duplicate collisions (e.g., append a counter or uuid). Then use those
guaranteed-unique frame.frame_id values in sorted_frames, frame_assignments, and
when calling _find_best_frame so no frames collapse into the same bucket.

---

Nitpick comments:
In `@backend/app/services/oasis_profile_generator.py`:
- Around line 1525-1556: Three entries in bio_templates ("Skeptic", "Neutral",
"Contrarian") use f-strings without any interpolation; remove the unnecessary f
prefix from those string literals so they become normal strings (leave the f
prefix on entries that do interpolate like "Enthusiast" and "Influencer");
update the bio_templates definition accordingly and run tests to ensure no
behavioral change.
- Line 1332: Replace the mutable class-level list DEFAULT_ARCHETYPES with an
immutable tuple to avoid accidental mutation: change the assignment that
currently uses list(_ARCHETYPE_DEFAULTS.keys()) to use
tuple(_ARCHETYPE_DEFAULTS.keys()) (or tuple(_ARCHETYPE_DEFAULTS)) in the
oasis_profile_generator module so DEFAULT_ARCHETYPES is immutable at class level
and cannot be modified by instances; update any code that relies on
list-specific methods to convert to list locally if needed.

In `@backend/app/services/simulation_config_generator.py`:
- Around line 1066-1072: The docstring promises neutral agents are split across
middle frames but the current selection logic always assigns each agent to the
single closest frame; update the assignment logic (where agent sentiment_bias is
mapped to a frame and frame.agent_ids are populated) so neutral agents (bias
near 0) are distributed evenly across the middle frames instead of always
choosing the single closest frame—identify the code that finds the "closest
frame" and replace it with a neutral-handling branch that computes the
middle-frame subset (e.g., the central one or two frames of the frames list
sorted by positivity) and round-robin or evenly assigns neutral agent ids into
those NarrativeFrame.agent_ids; alternatively, if you prefer the simpler fix,
edit the docstring to remove the claim about neutral splitting and state that
agents are assigned to the single closest frame based on sentiment_bias.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b16abba3-aee8-4607-8e62-70363a9e30db

📥 Commits

Reviewing files that changed from the base of the PR and between ade8ab8 and 345acae.

📒 Files selected for processing (9)
  • backend/app/__init__.py
  • backend/app/preflight.py
  • backend/app/routes/__init__.py
  • backend/app/routes/health.py
  • backend/app/routes/ops.py
  • backend/app/services/executive_summary.py
  • backend/app/services/oasis_profile_generator.py
  • backend/app/services/report_agent.py
  • backend/app/services/simulation_config_generator.py

Comment on lines +16 to +22
# Required environment variable names (as strings, checked against os.environ)
_REQUIRED_ENV_VARS = [
'NEO4J_URI',
'NEO4J_PASSWORD',
'LLM_BASE_URL',
'LLM_MODEL_NAME',
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make env_vars check the environment, not the defaulted config.

This currently falls back to backend/app/config.py::Config, so NEO4J_PASSWORD, LLM_BASE_URL, and LLM_MODEL_NAME read as “present” even when nothing is set in os.environ. LLM_API_KEY is also missing from _REQUIRED_ENV_VARS, so the preflight can report success while backend/app/config.py::Config.validate() would still fail.

🛠️ Suggested fix
 _REQUIRED_ENV_VARS = [
+    'LLM_API_KEY',
     'NEO4J_URI',
     'NEO4J_PASSWORD',
     'LLM_BASE_URL',
     'LLM_MODEL_NAME',
 ]
@@
 def _check_required_env_vars() -> dict:
     """Check that required environment variables are set."""
-    missing = []
-    for var in _REQUIRED_ENV_VARS:
-        # Accept values set either via os.environ or via Config class attributes
-        env_val = os.environ.get(var)
-        config_attr = var.replace('LLM_BASE_URL', 'LLM_BASE_URL') \
-                        .replace('LLM_MODEL_NAME', 'LLM_MODEL_NAME')
-        config_val = getattr(Config, config_attr, None)
-        if not env_val and not config_val:
-            missing.append(var)
+    missing = [var for var in _REQUIRED_ENV_VARS if not os.environ.get(var)]
     return {"ok": len(missing) == 0, "missing": missing}

Also applies to: 91-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/preflight.py` around lines 16 - 22, The preflight currently
treats config defaults as "present" so update the check to read from the actual
environment: change logic that builds env_vars to inspect os.environ (e.g., use
k in os.environ or os.environ.get(k)) instead of importing values from
backend.app.config.Config; add 'LLM_API_KEY' to _REQUIRED_ENV_VARS so it is
validated too; apply the same fix to the later duplicate check around lines
91-102 (ensure both places reference os.environ directly and not Config) and
keep using the same unique symbols _REQUIRED_ENV_VARS and env_vars so the
preflight reports true missing env vars.

Comment on lines +56 to +68
def _check_uploads_dir() -> bool:
"""Verify the uploads directory exists and is writable."""
try:
folder = os.path.abspath(Config.UPLOAD_FOLDER)
os.makedirs(folder, exist_ok=True)
test_path = os.path.join(folder, '.health_check')
with open(test_path, 'w') as f:
f.write('ok')
os.remove(test_path)
return True
except Exception as e:
logger.debug(f"Uploads dir check failed: {e}")
return False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a unique temp file for the writability probe.

Every request touches the same .health_check path. Two concurrent /api/health or /api/health/ready calls can race on os.remove() and flap the endpoint to false-negative failures.

🛠️ Suggested fix
-        test_path = os.path.join(folder, '.health_check')
-        with open(test_path, 'w') as f:
-            f.write('ok')
-        os.remove(test_path)
+        with tempfile.NamedTemporaryFile(
+            mode='w',
+            dir=folder,
+            prefix='.health_check_',
+            delete=True,
+        ) as f:
+            f.write('ok')
         return True

Also add import tempfile at the top of the file.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 66-66: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/routes/health.py` around lines 56 - 68, _check_uploads_dir
currently uses a fixed ".health_check" filename which causes races on os.remove;
change the probe to create a unique temp file inside the uploads folder (use
tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=folder) write/read as
needed, close and unlink that unique path in a finally block to ensure cleanup,
and keep returning True/False on success/exception; also add import tempfile at
the top of the file.

Comment on lines +119 to +128
# Surface any stored preflight results
preflight = current_app.extensions.get('preflight')

body = {
"status": overall,
"timestamp": datetime.now().isoformat(),
"components": components,
}
if preflight is not None:
body["preflight"] = preflight
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid returning raw preflight diagnostics from /api/health.

preflight includes internal URLs, filesystem paths, model inventory, and missing env var names. Exposing that verbatim on an unauthenticated health endpoint is unnecessary information leakage; return coarse ok flags only, or gate the detailed payload behind authenticated ops access.

🛠️ Suggested fix
-    if preflight is not None:
-        body["preflight"] = preflight
+    if preflight is not None:
+        body["preflight"] = {
+            name: {"ok": detail.get("ok", False)}
+            for name, detail in preflight.items()
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Surface any stored preflight results
preflight = current_app.extensions.get('preflight')
body = {
"status": overall,
"timestamp": datetime.now().isoformat(),
"components": components,
}
if preflight is not None:
body["preflight"] = preflight
# Surface any stored preflight results
preflight = current_app.extensions.get('preflight')
body = {
"status": overall,
"timestamp": datetime.now().isoformat(),
"components": components,
}
if preflight is not None:
body["preflight"] = {
name: {"ok": detail.get("ok", False)}
for name, detail in preflight.items()
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/routes/health.py` around lines 119 - 128, The route currently
attaches the raw preflight payload from current_app.extensions.get('preflight')
to the health response (variable preflight and body) which leaks internal
paths/URLs/model lists; instead, replace that with a sanitized summary or gate
detailed data behind authenticated ops access: when preflight is present set
body["preflight"] to a coarse object (e.g. {"ok": bool(preflight.passed or all
checks), "checks": {name: bool}}) or simply {"ok": True/False}, and only include
the full preflight payload if the requester is authenticated/authorized for ops
(check your auth helper in the health route) — update the logic around
current_app.extensions.get('preflight') and the body assignment to implement
this sanitization/gating.

Comment on lines +1233 to +1244
def record_rest(self, current_round: int) -> None:
"""
Passive fatigue decay when the agent skips a round.

Args:
current_round: The simulation round being skipped.
"""
if self.last_activity_round >= 0:
rounds_idle = max(0, current_round - self.last_activity_round)
self.fatigue_accumulation = max(
0.0, self.fatigue_accumulation - rounds_idle * self.fatigue_decay_rate
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: record_rest double-counts decay when called consecutively.

The method calculates rounds_idle from last_activity_round, but doesn't update any state to track when rest was last recorded. Calling record_rest multiple times causes cumulative over-decay.

Example:

  1. Agent acts at round 0 → fatigue_accumulation=0.5, last_activity_round=0
  2. record_rest(5)rounds_idle=5, decays by 0.25 → fatigue=0.25
  3. record_rest(6)rounds_idle=6 (still from round 0!), decays by 0.30 → fatigue≈0 (over-decayed)

The second call should only decay for 1 round (round 5→6), not 6 rounds.

🔧 Proposed fix: track last decay point
+    # Round number when fatigue was last updated (via rest or activity)
+    last_decay_round: int = -1
+
     def record_rest(self, current_round: int) -> None:
         """
         Passive fatigue decay when the agent skips a round.

         Args:
             current_round: The simulation round being skipped.
         """
-        if self.last_activity_round >= 0:
-            rounds_idle = max(0, current_round - self.last_activity_round)
+        reference_round = max(self.last_activity_round, self.last_decay_round)
+        if reference_round >= 0:
+            rounds_idle = max(0, current_round - reference_round)
             self.fatigue_accumulation = max(
                 0.0, self.fatigue_accumulation - rounds_idle * self.fatigue_decay_rate
             )
+        self.last_decay_round = current_round

Alternatively, update record_activity to also set last_decay_round = current_round after it finishes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/oasis_profile_generator.py` around lines 1233 - 1244,
record_rest(double-counts fatigue decay) — modify record_rest in
oasis_profile_generator.py to track the last round when decay was applied (e.g.,
add a new attribute like last_decay_round or last_rest_round initialized in the
profile constructor) and compute rounds_idle using current_round minus that
last_decay_round (or the later of last_activity_round and last_decay_round) so
repeated calls only apply decay for new rounds; after applying decay set
last_decay_round = current_round (and also ensure record_activity sets
last_decay_round = last_activity_round when activity happens).

Comment on lines +622 to +626
CRITICAL LANGUAGE REQUIREMENT: You MUST write ENTIRELY in English.
Do NOT use Chinese, Japanese, Korean, or any non-English language anywhere in your output.
Even if the source documents or simulation data contain non-English text, you must translate everything to English.
All section content, analysis, quotes (translated), and conclusions must be in English.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the stale Chinese-output rule from the section prompt.

Line 622 now mandates English-only output, but the same backend/app/services/report_agent.py::SECTION_SYSTEM_PROMPT_TEMPLATE later still tells the model to switch to Chinese when the source material is Chinese. That contradiction means the English-only hardening here is not actually enforced for section generation.

🛠️ Suggested fix
-3. [Language Consistency - Quoted Content Must Be Translated to Report Language]
-   - Tool returned content may contain English or mixed Chinese-English expressions
-   - If the simulation requirement and source material are in Chinese, the report must be entirely in Chinese
-   - When you quote English or mixed Chinese-English content from tools, you must translate it to fluent Chinese before including it in the report
-   - When translating, preserve the original meaning and ensure natural expression
+3. [Language Consistency - Output Must Remain in English]
+   - Tool returned content may contain non-English expressions
+   - Translate all quoted or paraphrased material into fluent English before including it in the report
+   - When translating, preserve the original meaning and ensure natural English expression
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/report_agent.py` around lines 622 - 626, The top-level
English-only enforcement conflicts with a later instruction inside
SECTION_SYSTEM_PROMPT_TEMPLATE in report_agent.py that tells the model to switch
to Chinese for Chinese source material; remove the stale Chinese-output rule
from SECTION_SYSTEM_PROMPT_TEMPLATE (or make it conditional on a dedicated
locale/config flag) so the section prompt no longer overrides the global
English-only requirement and the model output remains consistent with the
earlier enforcement.

Comment on lines +1081 to +1094
sorted_frames = sorted(self.frames, key=lambda f: f.initial_sentiment)
frame_assignments: Dict[str, List[int]] = {f.frame_id: [] for f in sorted_frames}

# Clear any prior assignments on frame objects
for frame in sorted_frames:
frame.agent_ids = []

num_frames = len(sorted_frames)

for agent in agent_configs:
bias = getattr(agent, "sentiment_bias", 0.0)
frame = self._find_best_frame(bias, sorted_frames)
frame_assignments[frame.frame_id].append(agent.agent_id)
frame.agent_ids.append(agent.agent_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make frame_id unique before using it as the assignment key.

Line 1082 keys the returned mapping by frame_id, so duplicate or blank ids from the LLM will silently collapse multiple frames into one bucket. Please normalize and de-duplicate ids when constructing NarrativeFrames.

Suggested fix
+        seen_frame_ids: Dict[str, int] = {}
         frames: List[NarrativeFrame] = []
         for item in frames_data[: self.num_frames]:
+            raw_frame_id = str(item.get("frame_id") or "").strip() or f"frame_{len(frames)}"
+            collision_count = seen_frame_ids.get(raw_frame_id, 0)
+            seen_frame_ids[raw_frame_id] = collision_count + 1
+            frame_id = raw_frame_id if collision_count == 0 else f"{raw_frame_id}_{collision_count}"
             frames.append(
                 NarrativeFrame(
-                    frame_id=item.get("frame_id", f"frame_{len(frames)}"),
+                    frame_id=frame_id,
                     label=item.get("label", item.get("frame_id", "Frame")),
                     description=item.get("description", ""),
                     initial_sentiment=float(item.get("initial_sentiment", 0.0)),
                     talking_points=item.get("talking_points", []),
                 )

Also applies to: 1154-1163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/simulation_config_generator.py` around lines 1081 -
1094, The mapping uses frame.frame_id as a key but frame_id can be blank or
duplicated from the LLM; ensure frame_id is normalized and unique before using
it as a dictionary key (either during NarrativeFrame construction or immediately
before building frame_assignments). Update the code that creates NarrativeFrame
instances (and any place that sets frame.frame_id) to trim/normalize the id,
generate a fallback unique id when blank, and de-duplicate collisions (e.g.,
append a counter or uuid). Then use those guaranteed-unique frame.frame_id
values in sorted_frames, frame_assignments, and when calling _find_best_frame so
no frames collapse into the same bucket.

Comment on lines +1154 to +1167
frames: List[NarrativeFrame] = []
for item in frames_data[: self.num_frames]:
frames.append(
NarrativeFrame(
frame_id=item.get("frame_id", f"frame_{len(frames)}"),
label=item.get("label", item.get("frame_id", "Frame")),
description=item.get("description", ""),
initial_sentiment=float(item.get("initial_sentiment", 0.0)),
talking_points=item.get("talking_points", []),
)
)

if not frames:
raise ValueError("LLM returned no valid frames")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the num_frames contract on both generation paths.

The class exposes num_frames, but the current helpers can return fewer: the LLM path accepts any non-empty result, and the rule-based fallback tops out at the six templates in template_pool. A caller asking for 7 or 8 frames can therefore silently get less after an LLM failure.

Suggested fix
-        if not frames:
+        if not frames:
             raise ValueError("LLM returned no valid frames")
+        if len(frames) != self.num_frames:
+            raise ValueError(
+                f"LLM returned {len(frames)} frames for requested {self.num_frames}"
+            )
 
         logger.info(f"MultiFrameInitializer: generated {len(frames)} frames via LLM")
         return frames
@@
-        logger.info(f"MultiFrameInitializer: generated {len(frames)} frames via rule-based fallback")
+        if len(frames) != self.num_frames:
+            raise ValueError(
+                f"Rule-based fallback produced {len(frames)} frames for requested {self.num_frames}"
+            )
+        logger.info(f"MultiFrameInitializer: generated {len(frames)} frames via rule-based fallback")
         return frames

Also applies to: 1266-1288

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/simulation_config_generator.py` around lines 1154 -
1167, The generation paths currently can return fewer frames than
self.num_frames (LLM path accepts any non-empty frames_data and the rule-based
path is limited by template_pool); update both generation branches so the final
list of NarrativeFrame instances has exactly self.num_frames: in the LLM path
(where frames_data is consumed into frames) validate len(frames) ==
self.num_frames and raise a ValueError if fewer, and in the rule-based fallback
(where template_pool is used) pad or repeat templates deterministically to
produce exactly self.num_frames (ensuring unique frame_id/label generation)
before returning; reference the NarrativeFrame construction loop (frames_data ->
NarrativeFrame) and the template_pool usage to locate where to apply
truncation/padding or the strict validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant