Skip to content

perf(cache): enable coordinator for service statuses and risk scores#20

Open
lspassos1 wants to merge 3 commits intofeat/cache-fill-coordinator-foundationfrom
perf/cache-fill-first-rollout
Open

perf(cache): enable coordinator for service statuses and risk scores#20
lspassos1 wants to merge 3 commits intofeat/cache-fill-coordinator-foundationfrom
perf/cache-fill-first-rollout

Conversation

@lspassos1
Copy link
Copy Markdown
Owner

Summary

This enables the cache-fill coordinator for the first two shared hot keys in the fork and documents the phase-two rollout. The allowlist stays registry-driven, limited to serviceStatuses and riskScoresLive, and both handlers keep their existing stale/local fallback behavior when coordination times out.

Root cause

The coordinator foundation does not change runtime behavior until concrete keys opt in through generated policy. The first rollout needs an intentionally narrow allowlist, handler regressions that lock the existing fallback semantics, and documentation that ties the runtime rollout back to the registry-first workflow.

Changes

  • add generated cache-fill policies for infra:service-statuses:v1 and risk:scores:sebuf:v1
  • regenerate server/_shared/_generated/cache-fill-registry.ts with only those two keys enabled
  • add handler regressions for list-service-statuses and get-risk-scores coordinator timeout paths
  • document the phase-two rollout in docs/architecture/cache-fill-coordinator.md
  • link the dataset-registry workflow doc to the phase-two issue stack
  • widen the coordinator unit-test timing harness so the cross-instance test stays stable under full validation load

Validation

  • npm run registry:generate
  • npm run registry:check
  • node --test tests/redis-caching.test.mjs
  • npm run typecheck
  • npm run typecheck:api
  • npm exec tsx -- --test tests/stock-backtest.test.mts tests/stock-analysis-history.test.mts
  • npm run test:data

Risk

Low to moderate. This is the first runtime enablement for real keys in the fork, but it is intentionally limited to two shared cache entries that already have existing stale/local fallback behavior. Known unrelated failures in npm run test:sidecar and npm run test:e2e:runtime were kept out of scope and not mixed into this rollout.

Type of change

  • Bug fix
  • New feature
  • New data source / feed
  • New map layer
  • Refactor / code cleanup
  • Documentation
  • CI / Build / Infrastructure

Affected areas

  • Map / Globe
  • News panels / RSS feeds
  • AI Insights / World Brief
  • Market Radar / Crypto
  • Desktop app (Tauri)
  • API endpoints (/api/*)
  • Config / Settings
  • Other: generated registry policy, cache rollout docs, server handler regressions

Checklist

  • Tested on worldmonitor.app variant
  • Tested on tech.worldmonitor.app variant (if applicable)
  • New RSS feed domains added to api/rss-proxy.js allowlist (if adding feeds)
  • No API keys or secrets committed
  • TypeScript compiles without errors (npm run typecheck)

Screenshots

Not applicable.

Refs #16
Depends on #15, #14

Root cause: the coordinator foundation alone does not change runtime behavior until specific shared keys opt in through the generated registry.

Changes:
- add generated cache-fill policies for infra:service-statuses:v1 and risk:scores:sebuf:v1
- document the phase-two rollout and issue lineage in the architecture docs
- add handler regressions that lock the existing stale/local fallback behavior for both enabled keys
- widen the coordinator unit-test timing harness to remove cross-instance flake under full validation load

Validation:
- npm run registry:generate
- npm run registry:check
- node --test tests/redis-caching.test.mjs
- npm run typecheck
- npm run typecheck:api
- npm exec tsx -- --test tests/stock-backtest.test.mts tests/stock-analysis-history.test.mts
- npm run test:data

Known unrelated failures kept out of scope:
- npm run test:sidecar
- npm run test:e2e:runtime

Refs #16
Depends on #15
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Apr 12, 2026 4:07pm

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR enables the distributed cache-fill coordinator for the first two opt-in keys (infra:service-statuses:v1 and risk:scores:sebuf:v1), adds handler-level regression tests that verify stale/local fallback semantics survive coordinator timeouts, and documents the phase-two rollout. It also widens the cross-instance coordinator unit-test timing window to stay stable under full CI load.

Changes at a glance:

  • registry/datasets.ts — adds CACHE_FILL_POLICIES block with serviceStatuses and riskScoresLive entries; buildDatasets() now projects those into DatasetContract.cacheFill
  • server/_shared/_generated/cache-fill-registry.ts — regenerated with only those two enabled keys; their policy values satisfy all coordinator invariants (waitMs < leaseMs, pollMaxMs < waitMs)
  • tests/redis-caching.test.mjs — adds importModuleWithCacheFillRegistry helper that injects a test-scoped accelerated registry (waitMs: 20, leaseMs: 80) into the transitive dependency chain, and adds two coordinator-enabled handler fallbacks tests exercising the module-cache and stale-cache fallback paths
  • The cross-instance coordinator test timing was widened to waitMs: 200 / leaseMs: 300, a deliberate change for CI stability mentioned in the PR description
  • docs/architecture/cache-fill-coordinator.md and dataset-registry.md — updated to record the first-rollout allowlist and link the phase-two tracking issues
  • tests/stock-backtest.test.mts / tests/stock-analysis-history.test.mts — minor changes aligning with the updated fake-upstash-redis.mts helper interface

Confidence Score: 4/5

Safe to merge — first coordinator rollout is narrow and all fallback paths are covered by regression tests with accelerated timing.

The primary concern from the prior review (handler regression tests running with production 3000ms/4000ms waitMs) is fully resolved: both tests now use importModuleWithCacheFillRegistry with an accelerated handlerFallbackRegistry (waitMs: 20, leaseMs: 80). All generated policy invariants are satisfied. The cross-instance coordinator test intentionally widens waitMs to 200ms for stability under CI load, which is a documented design decision. No blocking issues remain.

tests/redis-caching.test.mjs — the cross-instance test uses waitMs: 200 (intentionally widened); worth monitoring for CI flakiness under high CPU load.

Important Files Changed

Filename Overview
tests/redis-caching.test.mjs Adds importModuleWithCacheFillRegistry helper and two handler fallback tests with properly accelerated timing; cross-instance coordinator test uses waitMs: 200 (widened intentionally)
registry/datasets.ts Adds CACHE_FILL_POLICIES constant and projects it into DatasetContract.cacheFill inside buildDatasets(); both entries have correct policies
server/_shared/_generated/cache-fill-registry.ts Auto-generated from registry/datasets.ts; emits exactly the two enabled policies with correct parameter values satisfying all coordinator invariants
server/_shared/redis.ts No logic changes; reads the registry at module load — the new generated policies are picked up automatically
tests/helpers/fake-upstash-redis.mts Minor updates to pipeline/command handling shared by stock tests and coordinator harness
docs/architecture/cache-fill-coordinator.md New section documenting first-rollout allowlist with correct policy values and issue links
docs/architecture/dataset-registry.md Adds phase-two tracking link for the coordinator rollout issue stack
tests/stock-backtest.test.mts Updated to use the revised parseRedisCommand interface from the shared fake-upstash-redis helper
tests/stock-analysis-history.test.mts Updated to use the revised parseRedisCommand interface from the shared fake-upstash-redis helper

Sequence Diagram

sequenceDiagram
    participant R as registry/datasets.ts
    participant G as generate-dataset-registry.ts
    participant CR as cache-fill-registry.ts (generated)
    participant L as Leader instance
    participant F as Follower instance
    participant Redis as Upstash Redis

    R->>G: CACHE_FILL_POLICIES (serviceStatuses, riskScoresLive)
    G->>CR: emit only enabled entries

    Note over L,F: Cold miss for infra:service-statuses:v1
    L->>Redis: GET key → null (cache miss)
    F->>Redis: GET key → null (cache miss)
    L->>Redis: SET lock:fill:v1:… token NX PX 12000ms → OK (leader)
    F->>Redis: SET lock:fill:v1:… token NX PX 12000ms → null (follower)
    L->>Redis: GET key (recheck) → null
    L->>L: run fetcher
    L->>Redis: SET key data EX ttl
    L->>Redis: EVAL unlock script (token-safe)
    F->>Redis: GET key (poll) → data ✓
    F-->>F: return_null after waitMs=3000ms if timeout
Loading

Reviews (2): Last reviewed commit: "test(cache): speed up rollout coordinato..." | Re-trigger Greptile

Comment thread tests/redis-caching.test.mjs
The foundation PR introduced TypeScript extension imports in server/_shared/redis.ts, which broke Vercel's Edge bundling and diverged from the import pattern already used by the generated registry rollout. This change restores the repo-standard imports, keeps lock commands raw to avoid double-prefixing, propagates the true cache source for local joiners, and finishes the single-command Redis transport migration for deleteRedisKey.

It also centralizes the Redis command parser used by the Redis-aware tests so the body-based transport and lock-release paths stay in sync across suites.
Inject a short-lived cache-fill registry into the handler fallback regressions so they do not inherit the 3s/4s production wait budgets. The rollout suite now exercises the same fallback paths in tens of milliseconds and widens the cross-instance coordinator window to reduce CI timing flake.
Copy link
Copy Markdown
Owner Author

Addressed the rollout-specific bot feedback on the test layer.

What changed:

  • injected a short-lived cache-fill registry into the list-service-statuses / get-risk-scores fallback regressions so they no longer inherit the 3s / 4s production wait budgets
  • widened the cross-instance coordinator timing window used by the rollout test harness to reduce CI timing flake
  • cherry-picked the foundation fix that restored Edge-safe imports / raw lock operations / source propagation

Validation on this branch:

  • npm run registry:generate
  • npm run registry:check
  • node --test tests/bootstrap.test.mjs tests/edge-functions.test.mjs tests/redis-caching.test.mjs
  • npm exec tsx -- --test tests/stock-backtest.test.mts tests/stock-analysis-history.test.mts
  • npm run typecheck
  • npm run typecheck:api
  • npm run test:data

Unrelated branch failures remain unchanged:

  • npm run test:sidecar -> 1 failure in src-tauri/sidecar/local-api-server.test.mjs around cloud-fallback Origin
  • npm run test:e2e:runtime -> 4 failures in e2e/runtime-fetch.spec.ts (download URL, MapContainer fallback cleanup, loadMarkets, fetchHapiSummary)

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@lspassos1 lspassos1 marked this pull request as ready for review April 13, 2026 00:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e406ca177

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread registry/datasets.ts
Comment on lines +561 to +564
waitMs: 3_000,
pollMinMs: 75,
pollMaxMs: 175,
fallback: 'return_null',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Increase waitMs for serviceStatuses coordinator policy

The new serviceStatuses cache-fill policy uses waitMs: 3000 with fallback: 'return_null', but the handler’s upstream checks can legitimately run much longer (UPSTREAM_TIMEOUT_MS is 10,000ms in server/worldmonitor/infrastructure/v1/_shared.ts). In cross-instance cold-miss contention, followers will time out after 3s, receive null, and listServiceStatuses then falls through to results || fallbackStatusesCache?.data || [], which returns an empty status list on cold instances even while the leader is still computing valid data. This is a user-visible regression from enabling coordination for this key; the wait window or fallback mode should be aligned with the handler’s worst-case fetch latency.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant