perf(cache): enable coordinator for service statuses and risk scores#20
perf(cache): enable coordinator for service statuses and risk scores#20lspassos1 wants to merge 3 commits intofeat/cache-fill-coordinator-foundationfrom
Conversation
Root cause: the coordinator foundation alone does not change runtime behavior until specific shared keys opt in through the generated registry. Changes: - add generated cache-fill policies for infra:service-statuses:v1 and risk:scores:sebuf:v1 - document the phase-two rollout and issue lineage in the architecture docs - add handler regressions that lock the existing stale/local fallback behavior for both enabled keys - widen the coordinator unit-test timing harness to remove cross-instance flake under full validation load Validation: - npm run registry:generate - npm run registry:check - node --test tests/redis-caching.test.mjs - npm run typecheck - npm run typecheck:api - npm exec tsx -- --test tests/stock-backtest.test.mts tests/stock-analysis-history.test.mts - npm run test:data Known unrelated failures kept out of scope: - npm run test:sidecar - npm run test:e2e:runtime Refs #16 Depends on #15
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR enables the distributed cache-fill coordinator for the first two opt-in keys ( Changes at a glance:
Confidence Score: 4/5Safe to merge — first coordinator rollout is narrow and all fallback paths are covered by regression tests with accelerated timing. The primary concern from the prior review (handler regression tests running with production 3000ms/4000ms tests/redis-caching.test.mjs — the cross-instance test uses Important Files Changed
Sequence DiagramsequenceDiagram
participant R as registry/datasets.ts
participant G as generate-dataset-registry.ts
participant CR as cache-fill-registry.ts (generated)
participant L as Leader instance
participant F as Follower instance
participant Redis as Upstash Redis
R->>G: CACHE_FILL_POLICIES (serviceStatuses, riskScoresLive)
G->>CR: emit only enabled entries
Note over L,F: Cold miss for infra:service-statuses:v1
L->>Redis: GET key → null (cache miss)
F->>Redis: GET key → null (cache miss)
L->>Redis: SET lock:fill:v1:… token NX PX 12000ms → OK (leader)
F->>Redis: SET lock:fill:v1:… token NX PX 12000ms → null (follower)
L->>Redis: GET key (recheck) → null
L->>L: run fetcher
L->>Redis: SET key data EX ttl
L->>Redis: EVAL unlock script (token-safe)
F->>Redis: GET key (poll) → data ✓
F-->>F: return_null after waitMs=3000ms if timeout
Reviews (2): Last reviewed commit: "test(cache): speed up rollout coordinato..." | Re-trigger Greptile |
The foundation PR introduced TypeScript extension imports in server/_shared/redis.ts, which broke Vercel's Edge bundling and diverged from the import pattern already used by the generated registry rollout. This change restores the repo-standard imports, keeps lock commands raw to avoid double-prefixing, propagates the true cache source for local joiners, and finishes the single-command Redis transport migration for deleteRedisKey. It also centralizes the Redis command parser used by the Redis-aware tests so the body-based transport and lock-release paths stay in sync across suites.
Inject a short-lived cache-fill registry into the handler fallback regressions so they do not inherit the 3s/4s production wait budgets. The rollout suite now exercises the same fallback paths in tens of milliseconds and widens the cross-instance coordinator window to reduce CI timing flake.
|
Addressed the rollout-specific bot feedback on the test layer. What changed:
Validation on this branch:
Unrelated branch failures remain unchanged:
|
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e406ca177
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| waitMs: 3_000, | ||
| pollMinMs: 75, | ||
| pollMaxMs: 175, | ||
| fallback: 'return_null', |
There was a problem hiding this comment.
Increase waitMs for serviceStatuses coordinator policy
The new serviceStatuses cache-fill policy uses waitMs: 3000 with fallback: 'return_null', but the handler’s upstream checks can legitimately run much longer (UPSTREAM_TIMEOUT_MS is 10,000ms in server/worldmonitor/infrastructure/v1/_shared.ts). In cross-instance cold-miss contention, followers will time out after 3s, receive null, and listServiceStatuses then falls through to results || fallbackStatusesCache?.data || [], which returns an empty status list on cold instances even while the leader is still computing valid data. This is a user-visible regression from enabling coordination for this key; the wait window or fallback mode should be aligned with the handler’s worst-case fetch latency.
Useful? React with 👍 / 👎.
Summary
This enables the cache-fill coordinator for the first two shared hot keys in the fork and documents the phase-two rollout. The allowlist stays registry-driven, limited to
serviceStatusesandriskScoresLive, and both handlers keep their existing stale/local fallback behavior when coordination times out.Root cause
The coordinator foundation does not change runtime behavior until concrete keys opt in through generated policy. The first rollout needs an intentionally narrow allowlist, handler regressions that lock the existing fallback semantics, and documentation that ties the runtime rollout back to the registry-first workflow.
Changes
infra:service-statuses:v1andrisk:scores:sebuf:v1server/_shared/_generated/cache-fill-registry.tswith only those two keys enabledlist-service-statusesandget-risk-scorescoordinator timeout pathsdocs/architecture/cache-fill-coordinator.mdValidation
npm run registry:generatenpm run registry:checknode --test tests/redis-caching.test.mjsnpm run typechecknpm run typecheck:apinpm exec tsx -- --test tests/stock-backtest.test.mts tests/stock-analysis-history.test.mtsnpm run test:dataRisk
Low to moderate. This is the first runtime enablement for real keys in the fork, but it is intentionally limited to two shared cache entries that already have existing stale/local fallback behavior. Known unrelated failures in
npm run test:sidecarandnpm run test:e2e:runtimewere kept out of scope and not mixed into this rollout.Type of change
Affected areas
/api/*)Checklist
api/rss-proxy.jsallowlist (if adding feeds)npm run typecheck)Screenshots
Not applicable.
Refs #16
Depends on #15, #14