feat: MiroFish Phase 1+2 improvements - sim quality, ops, report quality#1
feat: MiroFish Phase 1+2 improvements - sim quality, ops, report quality#1
Conversation
…ators, outcome scorer - Add NER_MODEL config option to Config (default: minimax-m2.7:cloud) - Add _deduplicate_entities() to EntityReader using SequenceMatcher (threshold 0.85) - Add PlatformSimulator base class + TwitterSimulator + RedditSimulator (platform_simulators.py) - Add SimulationOutcome dataclass + OutcomeScorer class (outcome_scorer.py) All changes are additive only. No existing functionality modified. Implements Phase 1 of MIROFISH_MEMO_SIM_QUALITY.md roadmap.
📝 WalkthroughWalkthroughAdds health/ops endpoints and startup preflight checks; introduces entity deduplication, platform simulators, outcome scoring, executive-summary and persona generators; changes report endpoint to require Neo4j extension and to pass GraphToolsService into ReportAgent; removes Ollama from docker-compose; adds NER model config. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request marks Phase 1 of the simulation quality improvements, focusing on enhancing the realism and analytical capabilities of the MiroFish simulation. It introduces a modular system for platform-specific content behavior and comprehensive outcome scoring, alongside refining entity processing to improve data quality. These changes are designed to be additive and pluggable, setting the foundation for future simulation enhancements without altering existing core logic. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docker-compose.yml (1)
2-15:⚠️ Potential issue | 🟠 MajorActivate Docker-mode LLM/embedding endpoints in docker-compose.yml.
The
mirofishservice defaults tolocalhost:11434for both LLM and embedding endpoints (frombackend/app/config.py), but inside a container,localhostresolves to the container itself, not the host. After removing the in-composeollamaservice, these defaults will fail unless explicitly overridden.The project already documents Docker overrides in
.env.example(commented out with the note "Uncomment these when running ALL services inside Docker"):LLM_BASE_URL=http://ollama:11434/v1 EMBEDDING_BASE_URL=http://ollama:11434However, since the
ollamaservice no longer exists in the compose stack, activate these overrides usingextra_hoststo map the container hostname, or update.env.exampleto reflect the new offline-only setup.Suggested wiring for host.docker.internal
mirofish: build: . container_name: mirofish-offline env_file: - .env + extra_hosts: + - "host.docker.internal:host-gateway" + environment: + LLM_BASE_URL: ${LLM_BASE_URL:-http://host.docker.internal:11434/v1} + EMBEDDING_BASE_URL: ${EMBEDDING_BASE_URL:-http://host.docker.internal:11434} ports: - "3000:3000" - "5001:5001"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 2 - 15, The mirofish service is using host-local LLM/embedding defaults that won't resolve inside the container; add an extra_hosts mapping under the service to point the ollama hostname to the Docker host (e.g., add extra_hosts: - "ollama:host-gateway" to the mirofish service) and ensure the .env or .env.example contains the override variables LLM_BASE_URL=http://ollama:11434/v1 and EMBEDDING_BASE_URL=http://ollama:11434 so the app (config.py) uses those container-resolvable endpoints.
🧹 Nitpick comments (1)
backend/app/services/outcome_scorer.py (1)
129-134: Stabilize tied influencer rankings.Equal engagement scores currently inherit input order, so the same simulation can emit a different
top_influencerslist when records arrive in a different order. A secondary name sort keeps reports reproducible.♻️ Proposed refactor
ranked = sorted( influence_scores.items(), - key=lambda item: item[1], - reverse=True, + key=lambda item: (-item[1], item[0].lower()), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/outcome_scorer.py` around lines 129 - 134, The current sort of influence_scores in outcome_scorer.py only keys by score so ties preserve unpredictable input order; update the sorted call that produces ranked (the block using influence_scores.items() and influencer_limit) to use a deterministic secondary key of the influencer name (e.g., sort by score desc then name asc) so tied engagement scores are reproducibly ordered in top_influencers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/entity_reader.py`:
- Around line 79-87: Validate deduplication_threshold in EntityReader.__init__:
check that the provided deduplication_threshold is between 0.0 and 1.0
(inclusive or exclusive per project convention) and raise a ValueError with a
clear message if out of range; update the __init__ signature in entity_reader.py
to perform this early validation (before assigning to
self.deduplication_threshold) so invalid values cannot silently alter dedup
behavior.
- Around line 408-410: In _merge_entity_into_canonical, avoid sorting merged
labels because get_entity_type is order-dependent; instead preserve
canonical.labels order and append any labels from duplicate.labels that are not
already present (dedupe while keeping canonical order), so update
canonical.labels to canonical.labels + [l for l in duplicate.labels if l not in
canonical.labels]; this ensures get_entity_type and downstream
entity_types_found logic remain stable and later same-type merges still match
the canonical record.
- Around line 417-421: The merge loop treats falsy canonical values as missing;
update the condition in the block that assigns merged_attributes from
duplicate.attributes (where canonical and duplicate are used) to only treat
truly absent/undefined values as missing (e.g., check for key not in
merged_attributes or merged_attributes[key] is None) instead of using a
truthiness test like not merged_attributes[key], so valid values like 0 or False
are preserved.
In `@backend/app/services/outcome_scorer.py`:
- Around line 62-71: The engagement sum in outcome_scorer.py currently adds both
alias fields (e.g., "retweets" and "reposts", "comments" and "replies") causing
double-counting; update the aggregation logic around components/_value so
aliases are deduplicated by mapping aliases to a canonical metric (or taking the
max of alias pairs) before summing—e.g., treat "retweets"/"reposts" as one
metric and "comments"/"replies" as one metric, compute a single value for each
canonical interaction, then return the int(sum(...)) of those canonical values.
---
Outside diff comments:
In `@docker-compose.yml`:
- Around line 2-15: The mirofish service is using host-local LLM/embedding
defaults that won't resolve inside the container; add an extra_hosts mapping
under the service to point the ollama hostname to the Docker host (e.g., add
extra_hosts: - "ollama:host-gateway" to the mirofish service) and ensure the
.env or .env.example contains the override variables
LLM_BASE_URL=http://ollama:11434/v1 and EMBEDDING_BASE_URL=http://ollama:11434
so the app (config.py) uses those container-resolvable endpoints.
---
Nitpick comments:
In `@backend/app/services/outcome_scorer.py`:
- Around line 129-134: The current sort of influence_scores in outcome_scorer.py
only keys by score so ties preserve unpredictable input order; update the sorted
call that produces ranked (the block using influence_scores.items() and
influencer_limit) to use a deterministic secondary key of the influencer name
(e.g., sort by score desc then name asc) so tied engagement scores are
reproducibly ordered in top_influencers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54cfec22-5bf5-4934-ae58-2923d075ad18
📒 Files selected for processing (7)
backend/app/api/report.pybackend/app/config.pybackend/app/services/__init__.pybackend/app/services/entity_reader.pybackend/app/services/outcome_scorer.pybackend/app/services/platform_simulators.pydocker-compose.yml
| def __init__( | ||
| self, | ||
| storage: GraphStorage, | ||
| ner_model: Optional[str] = None, | ||
| deduplication_threshold: float = 0.85, | ||
| ): | ||
| self.storage = storage | ||
| self.ner_model = ner_model or Config.NER_MODEL | ||
| self.deduplication_threshold = deduplication_threshold |
There was a problem hiding this comment.
Reject out-of-range similarity thresholds early.
An invalid deduplication_threshold silently changes dedup behavior: values above 1 prevent merges, and values below 0 merge every same-type pair.
💡 Suggested fix
def __init__(
self,
storage: GraphStorage,
ner_model: Optional[str] = None,
deduplication_threshold: float = 0.85,
):
self.storage = storage
self.ner_model = ner_model or Config.NER_MODEL
+ if not 0.0 <= deduplication_threshold <= 1.0:
+ raise ValueError("deduplication_threshold must be between 0.0 and 1.0")
self.deduplication_threshold = deduplication_threshold🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/entity_reader.py` around lines 79 - 87, Validate
deduplication_threshold in EntityReader.__init__: check that the provided
deduplication_threshold is between 0.0 and 1.0 (inclusive or exclusive per
project convention) and raise a ValueError with a clear message if out of range;
update the __init__ signature in entity_reader.py to perform this early
validation (before assigning to self.deduplication_threshold) so invalid values
cannot silently alter dedup behavior.
| def _merge_entity_into_canonical(self, canonical: EntityNode, duplicate: EntityNode) -> None: | ||
| """Merge duplicate entity details into the canonical entity in place.""" | ||
| canonical.labels = sorted(set(canonical.labels) | set(duplicate.labels)) |
There was a problem hiding this comment.
Preserve the canonical label order here.
get_entity_type() is order-dependent, so sorting the merged labels can change an entity from its original filtered type to some other label mid-dedup. That leaks into the recomputed entity_types_found on Lines 247-250 and can also stop later same-type matches from merging into the same canonical record.
💡 Suggested fix
- canonical.labels = sorted(set(canonical.labels) | set(duplicate.labels))
+ canonical.labels = list(dict.fromkeys(canonical.labels + duplicate.labels))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _merge_entity_into_canonical(self, canonical: EntityNode, duplicate: EntityNode) -> None: | |
| """Merge duplicate entity details into the canonical entity in place.""" | |
| canonical.labels = sorted(set(canonical.labels) | set(duplicate.labels)) | |
| def _merge_entity_into_canonical(self, canonical: EntityNode, duplicate: EntityNode) -> None: | |
| """Merge duplicate entity details into the canonical entity in place.""" | |
| canonical.labels = list(dict.fromkeys(canonical.labels + duplicate.labels)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/entity_reader.py` around lines 408 - 410, In
_merge_entity_into_canonical, avoid sorting merged labels because
get_entity_type is order-dependent; instead preserve canonical.labels order and
append any labels from duplicate.labels that are not already present (dedupe
while keeping canonical order), so update canonical.labels to canonical.labels +
[l for l in duplicate.labels if l not in canonical.labels]; this ensures
get_entity_type and downstream entity_types_found logic remain stable and later
same-type merges still match the canonical record.
| merged_attributes = dict(canonical.attributes) | ||
| for key, value in duplicate.attributes.items(): | ||
| if key not in merged_attributes or not merged_attributes[key]: | ||
| merged_attributes[key] = value | ||
| canonical.attributes = merged_attributes |
There was a problem hiding this comment.
Don't treat falsy attribute values as missing.
not merged_attributes[key] overwrites valid canonical values like 0 and False during deduplication, which silently corrupts merged facts.
💡 Suggested fix
merged_attributes = dict(canonical.attributes)
for key, value in duplicate.attributes.items():
- if key not in merged_attributes or not merged_attributes[key]:
+ current_value = merged_attributes.get(key)
+ if key not in merged_attributes or current_value is None or current_value == "":
merged_attributes[key] = value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/entity_reader.py` around lines 417 - 421, The merge loop
treats falsy canonical values as missing; update the condition in the block that
assigns merged_attributes from duplicate.attributes (where canonical and
duplicate are used) to only treat truly absent/undefined values as missing
(e.g., check for key not in merged_attributes or merged_attributes[key] is None)
instead of using a truthiness test like not merged_attributes[key], so valid
values like 0 or False are preserved.
| components = ( | ||
| self._value(post, "likes", 0), | ||
| self._value(post, "upvotes", 0), | ||
| self._value(post, "comments", 0), | ||
| self._value(post, "replies", 0), | ||
| self._value(post, "shares", 0), | ||
| self._value(post, "retweets", 0), | ||
| self._value(post, "reposts", 0), | ||
| ) | ||
| return int(sum(int(component or 0) for component in components)) |
There was a problem hiding this comment.
Avoid double-counting alias metrics in engagement totals.
backend/app/services/platform_simulators.py lines 47-48 and 77-80 treat retweets/reposts and comments/replies as alternate names for the same interaction. Summing both here will inflate total_engagement, viral_posts_count, average_post_reach when it falls back to engagement, and top_influencers as soon as a post carries both aliases.
🛠️ Proposed fix
def _engagement_value(self, post: Any) -> int:
direct_value = self._value(post, "engagement", None)
if direct_value is not None:
return int(direct_value)
+ comment_count = max(
+ int(self._value(post, "comments", 0) or 0),
+ int(self._value(post, "replies", 0) or 0),
+ )
+ repost_count = max(
+ int(self._value(post, "retweets", 0) or 0),
+ int(self._value(post, "reposts", 0) or 0),
+ )
+
components = (
self._value(post, "likes", 0),
self._value(post, "upvotes", 0),
- self._value(post, "comments", 0),
- self._value(post, "replies", 0),
+ comment_count,
self._value(post, "shares", 0),
- self._value(post, "retweets", 0),
- self._value(post, "reposts", 0),
+ repost_count,
)
return int(sum(int(component or 0) for component in components))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| components = ( | |
| self._value(post, "likes", 0), | |
| self._value(post, "upvotes", 0), | |
| self._value(post, "comments", 0), | |
| self._value(post, "replies", 0), | |
| self._value(post, "shares", 0), | |
| self._value(post, "retweets", 0), | |
| self._value(post, "reposts", 0), | |
| ) | |
| return int(sum(int(component or 0) for component in components)) | |
| comment_count = max( | |
| int(self._value(post, "comments", 0) or 0), | |
| int(self._value(post, "replies", 0) or 0), | |
| ) | |
| repost_count = max( | |
| int(self._value(post, "retweets", 0) or 0), | |
| int(self._value(post, "reposts", 0) or 0), | |
| ) | |
| components = ( | |
| self._value(post, "likes", 0), | |
| self._value(post, "upvotes", 0), | |
| comment_count, | |
| self._value(post, "shares", 0), | |
| repost_count, | |
| ) | |
| return int(sum(int(component or 0) for component in components)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/outcome_scorer.py` around lines 62 - 71, The engagement
sum in outcome_scorer.py currently adds both alias fields (e.g., "retweets" and
"reposts", "comments" and "replies") causing double-counting; update the
aggregation logic around components/_value so aliases are deduplicated by
mapping aliases to a canonical metric (or taking the max of alias pairs) before
summing—e.g., treat "retweets"/"reposts" as one metric and "comments"/"replies"
as one metric, compute a single value for each canonical interaction, then
return the int(sum(...)) of those canonical values.
- backend/app/routes/health.py: /api/health (Neo4j + Ollama + uploads component check, healthy/degraded/unhealthy), /api/health/ready (strict readiness gate for Prometheus pre-simulation checks) - backend/app/preflight.py: run_preflight_checks() — checks Neo4j, Ollama, required model, uploads dir, env vars on startup; logs warnings but never blocks startup - backend/app/routes/ops.py: /api/ops/simulation/<id>/poll — returns terminal, success, status, percent_complete, stalled, stalled_since fields for machine-readable Prometheus polling - backend/app/routes/__init__.py: package init exporting health_bp, ops_bp - backend/app/__init__.py: register health_bp and ops_bp; call run_preflight_checks() at startup and store result in app.extensions['preflight'] for /api/health to surface Additive only — no existing endpoints modified.
…lidation, executive summary Changes to backend/app/services/report_agent.py: - Confirmed all open() file writes already use encoding='utf-8' - Added CRITICAL LANGUAGE REQUIREMENT to PLAN_SYSTEM_PROMPT (English-only output) - Added CRITICAL LANGUAGE REQUIREMENT to SECTION_SYSTEM_PROMPT_TEMPLATE (English-only output) - Added ReportAgent._validate_section_output(): strips null bytes, removes control chars, normalizes line endings, warns if non-ASCII ratio > 20% - Added ReportAgent._run_report_sanity_check(): checks section min length, unicode replacement chars, and non-ASCII ratio; returns issues dict - Wired _validate_section_output() into generate_report() after each section - Wired _run_report_sanity_check() into generate_report() after assembly New file backend/app/services/executive_summary.py: - ExecutiveSummary dataclass with all structured fields (outcome, actors, metrics, timeline, platform sentiment, recommendations) - ExecutiveSummaryGenerator class: calls LLM to extract structured JSON from full report, saves executive_summary.json and executive_summary.md - English-only system prompt for LLM extraction - Graceful fallback on malformed LLM responses (safe_float, safe_list, etc.) - _render_markdown() produces clean Markdown card with emoji risk indicators Additive changes only — no existing logic modified.
… MultiFrameInitializer oasis_profile_generator.py: - Add behavioral variance fields to OasisAgentProfile (conformity, reactiveness, persuadability, behavioral_noise) - Add FatigueState dataclass with get_current_activity_level(), record_activity(), and record_rest() methods - Add ArchetypePersonaGenerator class: generates synthetic OASIS profiles from five seed archetypes (Enthusiast, Skeptic, Neutral, Influencer, Contrarian) with per-field Gaussian variance; supports both LLM-assisted and rule-based identity generation simulation_config_generator.py: - Add NarrativeFrame dataclass: stores frame_id, label, description, initial_sentiment, talking_points, and assigned agent_ids - Add MultiFrameInitializer class: generates N competing narrative frames at t=0 via LLM (with rule-based fallback), and distributes agents to frames by minimum-distance matching on sentiment_bias All changes are strictly additive; no existing code was modified.
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
backend/app/services/oasis_profile_generator.py (2)
1525-1556: Remove unnecessaryfprefix from string literals.Several strings in
bio_templatesdon't contain placeholders but use f-string syntax:
- Line 1527:
"Skeptic": f"I question everything..."- Line 1528:
"Neutral": f"Just here to read..."- Line 1530:
"Contrarian": f"If everyone agrees..."♻️ Suggested fix
bio_templates = { "Enthusiast": f"Passionate about innovation and progress. Always excited to discuss new ideas. #{archetype}", - "Skeptic": f"I question everything. Critical thinker. Not buying the hype.", - "Neutral": f"Just here to read and stay informed. Opinions are my own.", + "Skeptic": "I question everything. Critical thinker. Not buying the hype.", + "Neutral": "Just here to read and stay informed. Opinions are my own.", "Influencer": f"Content creator | thought leader | {random.randint(1, 50)}K followers. Let's connect!", - "Contrarian": f"If everyone agrees, someone is wrong. Usually them.", + "Contrarian": "If everyone agrees, someone is wrong. Usually them.", }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/oasis_profile_generator.py` around lines 1525 - 1556, Three entries in bio_templates ("Skeptic", "Neutral", "Contrarian") use f-strings without any interpolation; remove the unnecessary f prefix from those string literals so they become normal strings (leave the f prefix on entries that do interpolate like "Enthusiast" and "Influencer"); update the bio_templates definition accordingly and run tests to ensure no behavioral change.
1332-1332: Usetupleinstead oflistfor class-level constant.
DEFAULT_ARCHETYPESis a class attribute that should be immutable. Usinglist()creates a mutable object that could be accidentally modified, affecting all instances.♻️ Suggested fix
- DEFAULT_ARCHETYPES = list(_ARCHETYPE_DEFAULTS.keys()) + DEFAULT_ARCHETYPES = tuple(_ARCHETYPE_DEFAULTS.keys())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/oasis_profile_generator.py` at line 1332, Replace the mutable class-level list DEFAULT_ARCHETYPES with an immutable tuple to avoid accidental mutation: change the assignment that currently uses list(_ARCHETYPE_DEFAULTS.keys()) to use tuple(_ARCHETYPE_DEFAULTS.keys()) (or tuple(_ARCHETYPE_DEFAULTS)) in the oasis_profile_generator module so DEFAULT_ARCHETYPES is immutable at class level and cannot be modified by instances; update any code that relies on list-specific methods to convert to list locally if needed.backend/app/services/simulation_config_generator.py (1)
1066-1072: Either implement the documented neutral split or trim the docstring.Lines 1068-1070 promise neutral agents are spread across middle frames, but Lines 1303-1306 always pick a single closest frame. That makes the API description misleading and will concentrate neutral agents more than callers expect.
Also applies to: 1294-1307
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/simulation_config_generator.py` around lines 1066 - 1072, The docstring promises neutral agents are split across middle frames but the current selection logic always assigns each agent to the single closest frame; update the assignment logic (where agent sentiment_bias is mapped to a frame and frame.agent_ids are populated) so neutral agents (bias near 0) are distributed evenly across the middle frames instead of always choosing the single closest frame—identify the code that finds the "closest frame" and replace it with a neutral-handling branch that computes the middle-frame subset (e.g., the central one or two frames of the frames list sorted by positivity) and round-robin or evenly assigns neutral agent ids into those NarrativeFrame.agent_ids; alternatively, if you prefer the simpler fix, edit the docstring to remove the claim about neutral splitting and state that agents are assigned to the single closest frame based on sentiment_bias.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/preflight.py`:
- Around line 16-22: The preflight currently treats config defaults as "present"
so update the check to read from the actual environment: change logic that
builds env_vars to inspect os.environ (e.g., use k in os.environ or
os.environ.get(k)) instead of importing values from backend.app.config.Config;
add 'LLM_API_KEY' to _REQUIRED_ENV_VARS so it is validated too; apply the same
fix to the later duplicate check around lines 91-102 (ensure both places
reference os.environ directly and not Config) and keep using the same unique
symbols _REQUIRED_ENV_VARS and env_vars so the preflight reports true missing
env vars.
In `@backend/app/routes/health.py`:
- Around line 56-68: _check_uploads_dir currently uses a fixed ".health_check"
filename which causes races on os.remove; change the probe to create a unique
temp file inside the uploads folder (use tempfile.NamedTemporaryFile or
tempfile.mkstemp with dir=folder) write/read as needed, close and unlink that
unique path in a finally block to ensure cleanup, and keep returning True/False
on success/exception; also add import tempfile at the top of the file.
- Around line 119-128: The route currently attaches the raw preflight payload
from current_app.extensions.get('preflight') to the health response (variable
preflight and body) which leaks internal paths/URLs/model lists; instead,
replace that with a sanitized summary or gate detailed data behind authenticated
ops access: when preflight is present set body["preflight"] to a coarse object
(e.g. {"ok": bool(preflight.passed or all checks), "checks": {name: bool}}) or
simply {"ok": True/False}, and only include the full preflight payload if the
requester is authenticated/authorized for ops (check your auth helper in the
health route) — update the logic around current_app.extensions.get('preflight')
and the body assignment to implement this sanitization/gating.
In `@backend/app/services/oasis_profile_generator.py`:
- Around line 1233-1244: record_rest(double-counts fatigue decay) — modify
record_rest in oasis_profile_generator.py to track the last round when decay was
applied (e.g., add a new attribute like last_decay_round or last_rest_round
initialized in the profile constructor) and compute rounds_idle using
current_round minus that last_decay_round (or the later of last_activity_round
and last_decay_round) so repeated calls only apply decay for new rounds; after
applying decay set last_decay_round = current_round (and also ensure
record_activity sets last_decay_round = last_activity_round when activity
happens).
In `@backend/app/services/report_agent.py`:
- Around line 622-626: The top-level English-only enforcement conflicts with a
later instruction inside SECTION_SYSTEM_PROMPT_TEMPLATE in report_agent.py that
tells the model to switch to Chinese for Chinese source material; remove the
stale Chinese-output rule from SECTION_SYSTEM_PROMPT_TEMPLATE (or make it
conditional on a dedicated locale/config flag) so the section prompt no longer
overrides the global English-only requirement and the model output remains
consistent with the earlier enforcement.
In `@backend/app/services/simulation_config_generator.py`:
- Around line 1154-1167: The generation paths currently can return fewer frames
than self.num_frames (LLM path accepts any non-empty frames_data and the
rule-based path is limited by template_pool); update both generation branches so
the final list of NarrativeFrame instances has exactly self.num_frames: in the
LLM path (where frames_data is consumed into frames) validate len(frames) ==
self.num_frames and raise a ValueError if fewer, and in the rule-based fallback
(where template_pool is used) pad or repeat templates deterministically to
produce exactly self.num_frames (ensuring unique frame_id/label generation)
before returning; reference the NarrativeFrame construction loop (frames_data ->
NarrativeFrame) and the template_pool usage to locate where to apply
truncation/padding or the strict validation.
- Around line 1081-1094: The mapping uses frame.frame_id as a key but frame_id
can be blank or duplicated from the LLM; ensure frame_id is normalized and
unique before using it as a dictionary key (either during NarrativeFrame
construction or immediately before building frame_assignments). Update the code
that creates NarrativeFrame instances (and any place that sets frame.frame_id)
to trim/normalize the id, generate a fallback unique id when blank, and
de-duplicate collisions (e.g., append a counter or uuid). Then use those
guaranteed-unique frame.frame_id values in sorted_frames, frame_assignments, and
when calling _find_best_frame so no frames collapse into the same bucket.
---
Nitpick comments:
In `@backend/app/services/oasis_profile_generator.py`:
- Around line 1525-1556: Three entries in bio_templates ("Skeptic", "Neutral",
"Contrarian") use f-strings without any interpolation; remove the unnecessary f
prefix from those string literals so they become normal strings (leave the f
prefix on entries that do interpolate like "Enthusiast" and "Influencer");
update the bio_templates definition accordingly and run tests to ensure no
behavioral change.
- Line 1332: Replace the mutable class-level list DEFAULT_ARCHETYPES with an
immutable tuple to avoid accidental mutation: change the assignment that
currently uses list(_ARCHETYPE_DEFAULTS.keys()) to use
tuple(_ARCHETYPE_DEFAULTS.keys()) (or tuple(_ARCHETYPE_DEFAULTS)) in the
oasis_profile_generator module so DEFAULT_ARCHETYPES is immutable at class level
and cannot be modified by instances; update any code that relies on
list-specific methods to convert to list locally if needed.
In `@backend/app/services/simulation_config_generator.py`:
- Around line 1066-1072: The docstring promises neutral agents are split across
middle frames but the current selection logic always assigns each agent to the
single closest frame; update the assignment logic (where agent sentiment_bias is
mapped to a frame and frame.agent_ids are populated) so neutral agents (bias
near 0) are distributed evenly across the middle frames instead of always
choosing the single closest frame—identify the code that finds the "closest
frame" and replace it with a neutral-handling branch that computes the
middle-frame subset (e.g., the central one or two frames of the frames list
sorted by positivity) and round-robin or evenly assigns neutral agent ids into
those NarrativeFrame.agent_ids; alternatively, if you prefer the simpler fix,
edit the docstring to remove the claim about neutral splitting and state that
agents are assigned to the single closest frame based on sentiment_bias.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b16abba3-aee8-4607-8e62-70363a9e30db
📒 Files selected for processing (9)
backend/app/__init__.pybackend/app/preflight.pybackend/app/routes/__init__.pybackend/app/routes/health.pybackend/app/routes/ops.pybackend/app/services/executive_summary.pybackend/app/services/oasis_profile_generator.pybackend/app/services/report_agent.pybackend/app/services/simulation_config_generator.py
| # Required environment variable names (as strings, checked against os.environ) | ||
| _REQUIRED_ENV_VARS = [ | ||
| 'NEO4J_URI', | ||
| 'NEO4J_PASSWORD', | ||
| 'LLM_BASE_URL', | ||
| 'LLM_MODEL_NAME', | ||
| ] |
There was a problem hiding this comment.
Make env_vars check the environment, not the defaulted config.
This currently falls back to backend/app/config.py::Config, so NEO4J_PASSWORD, LLM_BASE_URL, and LLM_MODEL_NAME read as “present” even when nothing is set in os.environ. LLM_API_KEY is also missing from _REQUIRED_ENV_VARS, so the preflight can report success while backend/app/config.py::Config.validate() would still fail.
🛠️ Suggested fix
_REQUIRED_ENV_VARS = [
+ 'LLM_API_KEY',
'NEO4J_URI',
'NEO4J_PASSWORD',
'LLM_BASE_URL',
'LLM_MODEL_NAME',
]
@@
def _check_required_env_vars() -> dict:
"""Check that required environment variables are set."""
- missing = []
- for var in _REQUIRED_ENV_VARS:
- # Accept values set either via os.environ or via Config class attributes
- env_val = os.environ.get(var)
- config_attr = var.replace('LLM_BASE_URL', 'LLM_BASE_URL') \
- .replace('LLM_MODEL_NAME', 'LLM_MODEL_NAME')
- config_val = getattr(Config, config_attr, None)
- if not env_val and not config_val:
- missing.append(var)
+ missing = [var for var in _REQUIRED_ENV_VARS if not os.environ.get(var)]
return {"ok": len(missing) == 0, "missing": missing}Also applies to: 91-102
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/preflight.py` around lines 16 - 22, The preflight currently
treats config defaults as "present" so update the check to read from the actual
environment: change logic that builds env_vars to inspect os.environ (e.g., use
k in os.environ or os.environ.get(k)) instead of importing values from
backend.app.config.Config; add 'LLM_API_KEY' to _REQUIRED_ENV_VARS so it is
validated too; apply the same fix to the later duplicate check around lines
91-102 (ensure both places reference os.environ directly and not Config) and
keep using the same unique symbols _REQUIRED_ENV_VARS and env_vars so the
preflight reports true missing env vars.
| def _check_uploads_dir() -> bool: | ||
| """Verify the uploads directory exists and is writable.""" | ||
| try: | ||
| folder = os.path.abspath(Config.UPLOAD_FOLDER) | ||
| os.makedirs(folder, exist_ok=True) | ||
| test_path = os.path.join(folder, '.health_check') | ||
| with open(test_path, 'w') as f: | ||
| f.write('ok') | ||
| os.remove(test_path) | ||
| return True | ||
| except Exception as e: | ||
| logger.debug(f"Uploads dir check failed: {e}") | ||
| return False |
There was a problem hiding this comment.
Use a unique temp file for the writability probe.
Every request touches the same .health_check path. Two concurrent /api/health or /api/health/ready calls can race on os.remove() and flap the endpoint to false-negative failures.
🛠️ Suggested fix
- test_path = os.path.join(folder, '.health_check')
- with open(test_path, 'w') as f:
- f.write('ok')
- os.remove(test_path)
+ with tempfile.NamedTemporaryFile(
+ mode='w',
+ dir=folder,
+ prefix='.health_check_',
+ delete=True,
+ ) as f:
+ f.write('ok')
return TrueAlso add import tempfile at the top of the file.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 66-66: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/routes/health.py` around lines 56 - 68, _check_uploads_dir
currently uses a fixed ".health_check" filename which causes races on os.remove;
change the probe to create a unique temp file inside the uploads folder (use
tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=folder) write/read as
needed, close and unlink that unique path in a finally block to ensure cleanup,
and keep returning True/False on success/exception; also add import tempfile at
the top of the file.
| # Surface any stored preflight results | ||
| preflight = current_app.extensions.get('preflight') | ||
|
|
||
| body = { | ||
| "status": overall, | ||
| "timestamp": datetime.now().isoformat(), | ||
| "components": components, | ||
| } | ||
| if preflight is not None: | ||
| body["preflight"] = preflight |
There was a problem hiding this comment.
Avoid returning raw preflight diagnostics from /api/health.
preflight includes internal URLs, filesystem paths, model inventory, and missing env var names. Exposing that verbatim on an unauthenticated health endpoint is unnecessary information leakage; return coarse ok flags only, or gate the detailed payload behind authenticated ops access.
🛠️ Suggested fix
- if preflight is not None:
- body["preflight"] = preflight
+ if preflight is not None:
+ body["preflight"] = {
+ name: {"ok": detail.get("ok", False)}
+ for name, detail in preflight.items()
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Surface any stored preflight results | |
| preflight = current_app.extensions.get('preflight') | |
| body = { | |
| "status": overall, | |
| "timestamp": datetime.now().isoformat(), | |
| "components": components, | |
| } | |
| if preflight is not None: | |
| body["preflight"] = preflight | |
| # Surface any stored preflight results | |
| preflight = current_app.extensions.get('preflight') | |
| body = { | |
| "status": overall, | |
| "timestamp": datetime.now().isoformat(), | |
| "components": components, | |
| } | |
| if preflight is not None: | |
| body["preflight"] = { | |
| name: {"ok": detail.get("ok", False)} | |
| for name, detail in preflight.items() | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/routes/health.py` around lines 119 - 128, The route currently
attaches the raw preflight payload from current_app.extensions.get('preflight')
to the health response (variable preflight and body) which leaks internal
paths/URLs/model lists; instead, replace that with a sanitized summary or gate
detailed data behind authenticated ops access: when preflight is present set
body["preflight"] to a coarse object (e.g. {"ok": bool(preflight.passed or all
checks), "checks": {name: bool}}) or simply {"ok": True/False}, and only include
the full preflight payload if the requester is authenticated/authorized for ops
(check your auth helper in the health route) — update the logic around
current_app.extensions.get('preflight') and the body assignment to implement
this sanitization/gating.
| def record_rest(self, current_round: int) -> None: | ||
| """ | ||
| Passive fatigue decay when the agent skips a round. | ||
|
|
||
| Args: | ||
| current_round: The simulation round being skipped. | ||
| """ | ||
| if self.last_activity_round >= 0: | ||
| rounds_idle = max(0, current_round - self.last_activity_round) | ||
| self.fatigue_accumulation = max( | ||
| 0.0, self.fatigue_accumulation - rounds_idle * self.fatigue_decay_rate | ||
| ) |
There was a problem hiding this comment.
Bug: record_rest double-counts decay when called consecutively.
The method calculates rounds_idle from last_activity_round, but doesn't update any state to track when rest was last recorded. Calling record_rest multiple times causes cumulative over-decay.
Example:
- Agent acts at round 0 →
fatigue_accumulation=0.5,last_activity_round=0 record_rest(5)→rounds_idle=5, decays by 0.25 →fatigue=0.25record_rest(6)→rounds_idle=6(still from round 0!), decays by 0.30 →fatigue≈0(over-decayed)
The second call should only decay for 1 round (round 5→6), not 6 rounds.
🔧 Proposed fix: track last decay point
+ # Round number when fatigue was last updated (via rest or activity)
+ last_decay_round: int = -1
+
def record_rest(self, current_round: int) -> None:
"""
Passive fatigue decay when the agent skips a round.
Args:
current_round: The simulation round being skipped.
"""
- if self.last_activity_round >= 0:
- rounds_idle = max(0, current_round - self.last_activity_round)
+ reference_round = max(self.last_activity_round, self.last_decay_round)
+ if reference_round >= 0:
+ rounds_idle = max(0, current_round - reference_round)
self.fatigue_accumulation = max(
0.0, self.fatigue_accumulation - rounds_idle * self.fatigue_decay_rate
)
+ self.last_decay_round = current_roundAlternatively, update record_activity to also set last_decay_round = current_round after it finishes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/oasis_profile_generator.py` around lines 1233 - 1244,
record_rest(double-counts fatigue decay) — modify record_rest in
oasis_profile_generator.py to track the last round when decay was applied (e.g.,
add a new attribute like last_decay_round or last_rest_round initialized in the
profile constructor) and compute rounds_idle using current_round minus that
last_decay_round (or the later of last_activity_round and last_decay_round) so
repeated calls only apply decay for new rounds; after applying decay set
last_decay_round = current_round (and also ensure record_activity sets
last_decay_round = last_activity_round when activity happens).
| CRITICAL LANGUAGE REQUIREMENT: You MUST write ENTIRELY in English. | ||
| Do NOT use Chinese, Japanese, Korean, or any non-English language anywhere in your output. | ||
| Even if the source documents or simulation data contain non-English text, you must translate everything to English. | ||
| All section content, analysis, quotes (translated), and conclusions must be in English. | ||
|
|
There was a problem hiding this comment.
Remove the stale Chinese-output rule from the section prompt.
Line 622 now mandates English-only output, but the same backend/app/services/report_agent.py::SECTION_SYSTEM_PROMPT_TEMPLATE later still tells the model to switch to Chinese when the source material is Chinese. That contradiction means the English-only hardening here is not actually enforced for section generation.
🛠️ Suggested fix
-3. [Language Consistency - Quoted Content Must Be Translated to Report Language]
- - Tool returned content may contain English or mixed Chinese-English expressions
- - If the simulation requirement and source material are in Chinese, the report must be entirely in Chinese
- - When you quote English or mixed Chinese-English content from tools, you must translate it to fluent Chinese before including it in the report
- - When translating, preserve the original meaning and ensure natural expression
+3. [Language Consistency - Output Must Remain in English]
+ - Tool returned content may contain non-English expressions
+ - Translate all quoted or paraphrased material into fluent English before including it in the report
+ - When translating, preserve the original meaning and ensure natural English expression🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/report_agent.py` around lines 622 - 626, The top-level
English-only enforcement conflicts with a later instruction inside
SECTION_SYSTEM_PROMPT_TEMPLATE in report_agent.py that tells the model to switch
to Chinese for Chinese source material; remove the stale Chinese-output rule
from SECTION_SYSTEM_PROMPT_TEMPLATE (or make it conditional on a dedicated
locale/config flag) so the section prompt no longer overrides the global
English-only requirement and the model output remains consistent with the
earlier enforcement.
| sorted_frames = sorted(self.frames, key=lambda f: f.initial_sentiment) | ||
| frame_assignments: Dict[str, List[int]] = {f.frame_id: [] for f in sorted_frames} | ||
|
|
||
| # Clear any prior assignments on frame objects | ||
| for frame in sorted_frames: | ||
| frame.agent_ids = [] | ||
|
|
||
| num_frames = len(sorted_frames) | ||
|
|
||
| for agent in agent_configs: | ||
| bias = getattr(agent, "sentiment_bias", 0.0) | ||
| frame = self._find_best_frame(bias, sorted_frames) | ||
| frame_assignments[frame.frame_id].append(agent.agent_id) | ||
| frame.agent_ids.append(agent.agent_id) |
There was a problem hiding this comment.
Make frame_id unique before using it as the assignment key.
Line 1082 keys the returned mapping by frame_id, so duplicate or blank ids from the LLM will silently collapse multiple frames into one bucket. Please normalize and de-duplicate ids when constructing NarrativeFrames.
Suggested fix
+ seen_frame_ids: Dict[str, int] = {}
frames: List[NarrativeFrame] = []
for item in frames_data[: self.num_frames]:
+ raw_frame_id = str(item.get("frame_id") or "").strip() or f"frame_{len(frames)}"
+ collision_count = seen_frame_ids.get(raw_frame_id, 0)
+ seen_frame_ids[raw_frame_id] = collision_count + 1
+ frame_id = raw_frame_id if collision_count == 0 else f"{raw_frame_id}_{collision_count}"
frames.append(
NarrativeFrame(
- frame_id=item.get("frame_id", f"frame_{len(frames)}"),
+ frame_id=frame_id,
label=item.get("label", item.get("frame_id", "Frame")),
description=item.get("description", ""),
initial_sentiment=float(item.get("initial_sentiment", 0.0)),
talking_points=item.get("talking_points", []),
)Also applies to: 1154-1163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/simulation_config_generator.py` around lines 1081 -
1094, The mapping uses frame.frame_id as a key but frame_id can be blank or
duplicated from the LLM; ensure frame_id is normalized and unique before using
it as a dictionary key (either during NarrativeFrame construction or immediately
before building frame_assignments). Update the code that creates NarrativeFrame
instances (and any place that sets frame.frame_id) to trim/normalize the id,
generate a fallback unique id when blank, and de-duplicate collisions (e.g.,
append a counter or uuid). Then use those guaranteed-unique frame.frame_id
values in sorted_frames, frame_assignments, and when calling _find_best_frame so
no frames collapse into the same bucket.
| frames: List[NarrativeFrame] = [] | ||
| for item in frames_data[: self.num_frames]: | ||
| frames.append( | ||
| NarrativeFrame( | ||
| frame_id=item.get("frame_id", f"frame_{len(frames)}"), | ||
| label=item.get("label", item.get("frame_id", "Frame")), | ||
| description=item.get("description", ""), | ||
| initial_sentiment=float(item.get("initial_sentiment", 0.0)), | ||
| talking_points=item.get("talking_points", []), | ||
| ) | ||
| ) | ||
|
|
||
| if not frames: | ||
| raise ValueError("LLM returned no valid frames") |
There was a problem hiding this comment.
Preserve the num_frames contract on both generation paths.
The class exposes num_frames, but the current helpers can return fewer: the LLM path accepts any non-empty result, and the rule-based fallback tops out at the six templates in template_pool. A caller asking for 7 or 8 frames can therefore silently get less after an LLM failure.
Suggested fix
- if not frames:
+ if not frames:
raise ValueError("LLM returned no valid frames")
+ if len(frames) != self.num_frames:
+ raise ValueError(
+ f"LLM returned {len(frames)} frames for requested {self.num_frames}"
+ )
logger.info(f"MultiFrameInitializer: generated {len(frames)} frames via LLM")
return frames
@@
- logger.info(f"MultiFrameInitializer: generated {len(frames)} frames via rule-based fallback")
+ if len(frames) != self.num_frames:
+ raise ValueError(
+ f"Rule-based fallback produced {len(frames)} frames for requested {self.num_frames}"
+ )
+ logger.info(f"MultiFrameInitializer: generated {len(frames)} frames via rule-based fallback")
return framesAlso applies to: 1266-1288
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/simulation_config_generator.py` around lines 1154 -
1167, The generation paths currently can return fewer frames than
self.num_frames (LLM path accepts any non-empty frames_data and the rule-based
path is limited by template_pool); update both generation branches so the final
list of NarrativeFrame instances has exactly self.num_frames: in the LLM path
(where frames_data is consumed into frames) validate len(frames) ==
self.num_frames and raise a ValueError if fewer, and in the rule-based fallback
(where template_pool is used) pad or repeat templates deterministically to
produce exactly self.num_frames (ensuring unique frame_id/label generation)
before returning; reference the NarrativeFrame construction loop (frames_data ->
NarrativeFrame) and the template_pool usage to locate where to apply
truncation/padding or the strict validation.
Summary
Four commits of improvements across simulation quality, ops reliability, and report quality. All changes are additive - zero existing functionality broken.
?? Perseus - Phase 1 Sim Quality (commit ade8ab8)
NER_MODELconfig option (default: minimax-m2.7:cloud)_deduplicate_entities()in entity_reader.py - fuzzy dedup via SequenceMatcher (threshold 0.85)PlatformSimulatorABC +TwitterSimulator+RedditSimulator(platform_simulators.py)SimulationOutcomedataclass +OutcomeScorerclass (outcome_scorer.py)?? Prometheus - Phase 1 Ops & Reliability (commit b509ecb)
/api/health- Neo4j + Ollama + uploads checks, healthy/degraded/unhealthy/api/health/ready- strict readiness gate for Prometheus before starting simspreflight.py- 5 startup checks, logs warnings, never blocks startup/api/ops/simulation/<id>/poll- machine-readable: terminal, success, stalled, percent_complete?? Calypso - Phase 1 Report Quality (commit becc6e5)
_validate_section_output()- strips corruption, warns on high non-ASCII ratio_run_report_sanity_check()- post-generation integrity checkexecutive_summary.py- ExecutiveSummary dataclass + LLM extraction ? JSON + Markdown card?? Perseus - Phase 2 Sim Quality (commit 345acae)
FatigueState- per-agent fatigue tracking with decayArchetypePersonaGenerator- 5 archetypes with Gaussian jitter + LLM bio generationNarrativeFrame+MultiFrameInitializer- competing narrative frames at t=0 with agent distribution