Skip to content

Add climate news seed and ListClimateNews RPC#2532

Open
FayezBast wants to merge 3 commits intomainfrom
feat/climate-add-climate-news-intelligence
Open

Add climate news seed and ListClimateNews RPC#2532
FayezBast wants to merge 3 commits intomainfrom
feat/climate-add-climate-news-intelligence

Conversation

@FayezBast
Copy link
Copy Markdown
Collaborator

@FayezBast FayezBast commented Mar 30, 2026

Summary

Adds seed-climate-news.mjs to aggregate 9 authoritative climate/environment RSS feeds into climate:news-intelligence:v1, wires a 30-minute relay seed loop, and exposes the data through a new ListClimateNews climate proto RPC and server handler.

Not included in this PR:

  • MCP get_climate_data expansion to include climate:news-intelligence:v1
  • Bootstrap hydration registration for the new climate news key

Fixes #2469
Fixes #2560

Type of change

  • New feature
  • New data source / feed
  • Documentation

Affected areas

  • News panels / RSS feeds
  • API endpoints (/api/*)
  • Other: Climate proto/service seed pipeline

@mintlify
Copy link
Copy Markdown

mintlify bot commented Mar 30, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
WorldMonitor 🟢 Ready View Preview Mar 30, 2026, 1:36 AM

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Mar 30, 2026 9:37pm

Request Review

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR wires a full climate news intelligence pipeline: 9 RSS feeds are aggregated every 30 minutes by a new seed-climate-news.mjs script (spawned from the ais-relay.cjs relay loop), stored in Redis under climate:news-intelligence:v1, and exposed via a new ListClimateNews gRPC-over-HTTP endpoint backed by proto definitions and a generated TypeScript server/client. The change follows the established seed → Redis → RPC handler pattern used throughout the codebase.

Key findings:

  • P1 — intervalMin mismatch in seed-health.js: The health monitoring entry records intervalMin: 45 while the relay loop fires every 30 minutes. This will cause the seed-health endpoint to report incorrect staleness status. health.js already uses the correct 3× 30-minute window (maxStaleMin: 90), making the two health files inconsistent.
  • P2 — Missing bootstrap hydration: Per AGENTS.md, new data sources must be wired into api/bootstrap.js. The PR author explicitly defers this, but the gap means the first cold-cache page load will not benefit from pre-fetched data, and the convention is left untracked.
  • P2 — Atom fallback scope: The <entry> fallback in parseRssItems triggers only when zero <item> elements were found across all 9 feeds combined, not per-feed. Future Atom-only feeds added to FEEDS would silently return no items.
  • P2 — CDATA regex edge case: extractTag uses optional CDATA markers, so a tag whose content includes a literal ]]> substring would be silently truncated. Rare in well-formed RSS, but worth hardening.

Confidence Score: 4/5

Safe to merge after fixing the intervalMin mismatch in seed-health.js; everything else is minor.

One P1 issue: seed-health.js records intervalMin: 45 while the relay fires every 30 minutes, causing incorrect staleness monitoring. All remaining findings are P2 and do not block correctness of the happy path.

api/seed-health.jsintervalMin should be 30 to match the relay loop.

Important Files Changed

Filename Overview
scripts/seed-climate-news.mjs New seed script fetching 9 RSS feeds into climate:news-intelligence:v1; Atom fallback is per-batch (not per-feed) and the CDATA regex has a truncation edge-case.
api/seed-health.js Registers climate:news-intelligence health entry with intervalMin: 45, but the relay loop fires every 30 minutes — creates incorrect staleness monitoring.
server/worldmonitor/climate/v1/list-climate-news.ts Clean read-through RPC handler using getCachedJson; correctly falls back to empty response on cache miss or error.
scripts/ais-relay.cjs Adds 30-minute climate news seed loop delegating to the standalone .mjs script via execFile; in-flight guard prevents concurrent runs.
api/health.js Adds climateNews to both STANDALONE_KEYS and SEED_META with maxStaleMin: 90 (3× the 30-min interval) — correct.
server/gateway.ts Registers the new route at cache tier slow (30 min) — consistent with the 30-minute seed interval.
proto/worldmonitor/climate/v1/climate_news_item.proto New proto message for ClimateNewsItem; int64 fields correctly annotated with INT64_ENCODING_NUMBER.
proto/worldmonitor/climate/v1/service.proto Adds ListClimateNews RPC to ClimateService with correct HTTP GET annotation.
src/generated/server/worldmonitor/climate/v1/service_server.ts Generated server stub routes GET /api/climate/v1/list-climate-news to handler.listClimateNews — consistent with other routes.

Sequence Diagram

sequenceDiagram
    participant Relay as ais-relay.cjs (Railway)
    participant Seed as seed-climate-news.mjs
    participant RSS as RSS Feeds (x9)
    participant Redis as Upstash Redis
    participant Edge as Vercel Edge Function
    participant Client as Browser Client

    loop Every 30 minutes
        Relay->>Seed: execFile(node, seed-climate-news.mjs)
        Seed->>RSS: fetch() x9 feeds (15s timeout each)
        RSS-->>Seed: XML responses
        Seed->>Seed: parseRssItems() / dedup / sort
        Seed->>Redis: SET climate:news-intelligence:v1 (TTL 1800s)
        Seed->>Redis: SET seed-meta:climate:news-intelligence
        Seed-->>Relay: exit 0
    end

    Client->>Edge: GET /api/climate/v1/list-climate-news
    Edge->>Redis: getCachedJson(climate:news-intelligence:v1)
    Redis-->>Edge: { items[], fetchedAt }
    Edge-->>Client: ListClimateNewsResponse (JSON)
Loading

Comments Outside Diff (2)

  1. api/seed-health.js, line 29 (link)

    P1 intervalMin mismatch with actual relay loop

    The entry records intervalMin: 45, but ais-relay.cjs sets the climate news relay to fire every 30 minutes. The mismatch means the seed-health endpoint computes staleness on a 45-minute cadence while updates actually arrive on a 30-minute cadence — potentially producing incorrect "overdue" status in health monitoring.

    For reference, health.js correctly uses maxStaleMin: 90 (3× the 30-minute interval), making the two health files internally inconsistent with each other.

  2. api/bootstrap.js, line 9-80 (link)

    P2 Missing bootstrap hydration — violates AGENTS.md convention

    AGENTS.md documents an explicit rule:

    New data sources MUST have bootstrap hydration wired in api/bootstrap.js

    The climate:news-intelligence:v1 key is not registered in BOOTSTRAP_CACHE_KEYS. The PR description acknowledges this is intentionally deferred, but it means the first cold-cache page load will block on the RPC call rather than using pre-fetched data, and the convention is left untracked without a follow-up ticket reference.

    Context Used: AGENTS.md (source)

Reviews (1): Last reviewed commit: "Add climate news seed and ListClimateNew..." | Re-trigger Greptile

Comment on lines +92 to +110
if (items.length === 0) {
const entryRe = /<entry\b[^>]*>([\s\S]*?)<\/entry>/gi;
while ((match = entryRe.exec(bounded)) !== null) {
const block = match[1];
const title = decodeHtmlEntities(extractTag(block, 'title'));
const url = extractLink(block);
const publishedAt = parseDateMs(block);
const rawSummary = extractTag(block, 'summary') || extractTag(block, 'content');
if (!title || !url || !publishedAt) continue;
items.push({
id: `${stableHash(url)}-${publishedAt}`,
title,
url,
sourceName,
publishedAt,
summary: cleanSummary(rawSummary),
});
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Atom fallback only fires when all RSS <item> blocks return zero items across the entire feed

The Atom <entry> fallback is gated on if (items.length === 0). This means it is only attempted when the entire feed contained no <item> elements. For feeds that publish a mix of both (unusual but valid), Atom entries would be silently dropped. More practically, all 9 feeds in FEEDS are standard RSS feeds, so this isn't an immediate bug — but any future feed addition that uses Atom will silently return 0 items unless this gate is removed.

Consider restructuring to always attempt both parsers and merge results (deduplicating on id), or at minimum add a comment explaining that the fallback is per-feed, not per-batch.

Comment on lines +40 to +43
function extractTag(block, tagName) {
const re = new RegExp(`<${tagName}[^>]*>(?:<!\\[CDATA\\[)?([\\s\\S]*?)(?:\\]\\]>)?<\\/${tagName}>`, 'i');
return (block.match(re) || [])[1]?.trim() || '';
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 extractTag regex mishandles CDATA sections with embedded ]]> content

The current regex makes the CDATA start/end markers optional via (?:..)?:

<${tagName}[^>]*>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/${tagName}>

Because both the opening <![CDATA[ and closing ]]> markers are optional and independent, a tag such as <title><![CDATA[Breaking news ]]> & more]]></title> will capture only Breaking news (stopping at the first ]]>), silently truncating the title. The fix is to use two separate branches — one for CDATA content and one for plain text — rather than making both markers optional.

@FayezBast
Copy link
Copy Markdown
Collaborator Author

FayezBast commented Mar 30, 2026

p1 overclaims impact.
the configured intervalMin is wrong in seed-health.js,
but the actual stale threshold still ends up matching health.js at 90 minutes (45*2)

@SebastienMelki
Copy link
Copy Markdown
Collaborator

@FayezBast — good pipeline addition. A few things before merging:

  1. seed-health.js intervalMin: Change from 45 to 30 to match the actual relay loop interval. The two health files should tell the same story.
  2. Bootstrap deferral: Please open a follow-up issue for wiring climate:news-intelligence:v1 into api/bootstrap.js and link it here.
  3. Atom fallback scope: The entry fallback triggers only when zero item elements are found across ALL feeds, not per-feed. Any future Atom-only feed would silently return no items. Not a blocker but worth noting.

Otherwise the pipeline design is solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Follow-up: wire climate:news-intelligence:v1 into api/bootstrap.js feat(climate): add climate news intelligence seeder (9 RSS feeds)

2 participants