perf(edge): collapse CDN cache fragmentation on public GET routes#2566
perf(edge): collapse CDN cache fragmentation on public GET routes#2566JohnPhamous wants to merge 1 commit intokoala73:mainfrom
Conversation
- ACAO: * on CDN-cached non-premium GET 200s to eliminate per-origin cache variants - Premium routes (analyze-stock, backtest-stock, etc.) keep per-origin CORS - Cache middleware bot 403 responses (CF/browser cache benefit) - Add getPublicCorsHeaders() to server/cors.ts
|
@JohnPhamous is attempting to deploy a commit to the Elie Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryThis PR collapses CDN cache fragmentation on public GET routes by introducing
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant C as Browser (any origin)
participant CF as Cloudflare
participant VCDN as Vercel CDN
participant MW as Edge Middleware
participant GW as Edge Function (gateway)
Note over C,GW: First request — allowed origin, non-premium route (cache cold)
C->>CF: GET /api/market/v1/list-market-quotes (Origin: worldmonitor.app)
CF->>VCDN: forward (no CF cache)
VCDN->>MW: cache miss → run middleware
MW->>GW: UA passes bot check → forward
GW->>GW: isDisallowedOrigin() → false\nisAllowedOrigin() → true\ncdnCache set
GW-->>VCDN: 200 + CDN-Cache-Control + ACAO:* + no Vary
VCDN->>VCDN: store response (keyed by URL only)
VCDN-->>C: 200 ACAO:*
Note over C,GW: Subsequent request — disallowed origin (cache warm)
C->>CF: GET /api/market/v1/list-market-quotes (Origin: evil.example.com)
CF->>VCDN: forward
VCDN->>VCDN: cache HIT (no Vary, URL matches)
VCDN-->>C: 200 ACAO:* (isDisallowedOrigin never runs)
Note over C: Browser allows cross-origin read\nbecause ACAO: *
Note over C,GW: Bot UA request — middleware blocks
C->>CF: GET /api/... (Bot UA)
CF->>VCDN: forward
VCDN->>MW: run middleware
MW-->>C: 403 Cache-Control: public, max-age=86400
Note over CF: ⚠️ If CF has a custom cache rule\nfor 4xx: caches 403 by URL\nwithout Vary: User-Agent
Reviews (1): Last reviewed commit: "perf(edge): collapse CDN cache fragmenta..." | Re-trigger Greptile |
| if (BOT_UA.test(ua)) { | ||
| return new Response('{"error":"Forbidden"}', { | ||
| status: 403, | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| headers: { 'Content-Type': 'application/json', 'Cache-Control': 'public, max-age=86400' }, | ||
| }); |
There was a problem hiding this comment.
public on bot-block 403 can be served as a CDN cache hit to legitimate users
The middleware returns Cache-Control: public, max-age=86400 on both bot-UA 403 responses (line 124) and short-UA 403 responses (line 132). Per Vercel's execution model, CDN cache checks happen before middleware runs — so once any CDN layer in the request path caches a 403 response keyed only on the URL (no Vary: User-Agent), legitimate users requesting that same URL will receive the cached 403 without middleware ever executing.
Vercel CDN itself only stores responses with s-maxage or CDN-Cache-Control, so Vercel's own cache is safe here. However, the code comments confirm Cloudflare sits in front of api.worldmonitor.app — Cloudflare respects Cache-Control: public for its cache rules, and while it does not cache 4xx by default, a custom Page Rule or Cache Rule (present in many production CF setups) could enable it. Any such CF config would cause all requests to a URL hit by a bot to return 403 to real users for 24 hours.
public is the wrong directive for a security-gate error response; the intent (telling bots to back off) is better served by no-store or omitting Cache-Control entirely:
| if (BOT_UA.test(ua)) { | |
| return new Response('{"error":"Forbidden"}', { | |
| status: 403, | |
| headers: { 'Content-Type': 'application/json' }, | |
| headers: { 'Content-Type': 'application/json', 'Cache-Control': 'public, max-age=86400' }, | |
| }); | |
| headers: { 'Content-Type': 'application/json', 'Cache-Control': 'no-store' }, |
The same fix applies at line 132–134.
| if (!PREMIUM_RPC_PATHS.has(pathname)) { | ||
| const pub = getPublicCorsHeaders(); | ||
| for (const [k, v] of Object.entries(pub)) mergedHeaders.set(k, v); | ||
| mergedHeaders.delete('Vary'); | ||
| } |
There was a problem hiding this comment.
ACAO: * on CDN-cached responses bypasses isDisallowedOrigin() for browser clients
The code removed this warning from the previous version:
ACAO: * with no Vary would collapse all origins into one cache entry, bypassing
isDisallowedOrigin()for cache hits — Vercel CDN serves s-maxage responses without re-invoking the function, so a disallowed origin could read a cached ACAO: * response.
That observation is still true after this PR. The flow is now:
- Request from allowed origin hits a non-premium route → edge function runs →
isDisallowedOrigin()passes → response is cached by Vercel CDN withACAO: *,Varydeleted. - Request from a disallowed origin (e.g.
https://evil.example.com) for the same URL → Vercel CDN cache hit → edge function never runs →isDisallowedOrigin()never executes → browser receivesACAO: *response and the cross-origin read succeeds.
This is intentional for "public" data routes, and the premium guard is in place, so this isn't a catastrophic auth bypass. But it is a meaningful change in the access-control surface: previously, disallowed origins could not read any route via the browser (they got a 403 from the function). Now they can read all non-premium CDN-cached routes.
If the intent is that non-premium routes should be globally readable, this is fine — but it's worth explicitly documenting that isDisallowedOrigin() is no longer enforced at the CDN cache layer for public routes, so future maintainers don't add sensitive-but-non-premium data under the assumption that the origin guard protects it.
| const __dirname = dirname(fileURLToPath(import.meta.url)); | ||
| const corsSrc = readFileSync(resolve(__dirname, '..', 'server', 'cors.ts'), 'utf-8'); | ||
|
|
||
| describe('getPublicCorsHeaders', () => { | ||
| it('returns ACAO: * without Vary header', () => { | ||
| // Verify the function body contains ACAO: * and does NOT include Vary | ||
| const fnMatch = corsSrc.match(/export function getPublicCorsHeaders[\s\S]*?^}/m); | ||
| assert.ok(fnMatch, 'getPublicCorsHeaders function not found in server/cors.ts'); | ||
| const fnBody = fnMatch![0]; | ||
| assert.match(fnBody, /'Access-Control-Allow-Origin':\s*'\*'/, 'Should set ACAO: *'); | ||
| assert.doesNotMatch(fnBody, /Vary/, 'Should NOT include Vary header'); | ||
| }); | ||
|
|
||
| it('includes same Allow-Methods as getCorsHeaders', () => { | ||
| const pubMethods = corsSrc.match(/getPublicCorsHeaders[\s\S]*?Allow-Methods':\s*'([^']+)'/); | ||
| const perOriginMethods = corsSrc.match(/getCorsHeaders[\s\S]*?Allow-Methods':\s*'([^']+)'/); | ||
| assert.ok(pubMethods && perOriginMethods, 'Could not extract Allow-Methods from both functions'); | ||
| assert.equal(pubMethods![1], perOriginMethods![1], 'Allow-Methods should match between public and per-origin'); | ||
| }); | ||
|
|
||
| it('includes same Allow-Headers as getCorsHeaders', () => { | ||
| const pubHeaders = corsSrc.match(/getPublicCorsHeaders[\s\S]*?Allow-Headers':\s*'([^']+)'/); | ||
| const perOriginHeaders = corsSrc.match(/getCorsHeaders[\s\S]*?Allow-Headers':\s*'([^']+)'/); |
There was a problem hiding this comment.
Tests inspect source text rather than exercising runtime behavior
Both tests/cors-public-headers.test.mts and tests/gateway-cdn-cors.test.mts use readFileSync + regex over the TypeScript source files instead of importing and calling the actual functions. This approach:
- Will false-pass if the source text matches but the compiled output behaves differently (e.g., an environment variable override, dead branch, or build-time transformation changes the effective value).
- Will false-fail if anyone switches quote style (
'→"), reformats, or renames a local variable, even though the runtime behavior is unchanged. - Doesn't catch mismatches between
cors.tsandgateway.tsat integration level — only surface-level text patterns.
The api/_cors.test.mjs (also changed in this PR) shows the correct pattern: import the function, call it, and assert on the returned value. Bringing the new tests in line with that approach would give much stronger correctness guarantees:
// tests/cors-public-headers.test.mts — preferred approach
import { getPublicCorsHeaders, getCorsHeaders } from '../server/cors.ts';
import { describe, it } from 'node:test';
import assert from 'node:assert/strict';
describe('getPublicCorsHeaders', () => {
it('returns ACAO: * without Vary header', () => {
const h = getPublicCorsHeaders();
assert.equal(h['Access-Control-Allow-Origin'], '*');
assert.equal(h['Vary'], undefined);
});
// ...
});The same applies to tests/gateway-cdn-cors.test.mts — the assertions there are checking for specific source text patterns (e.g. !PREMIUM_RPC_PATHS\.has\(pathname\)) that would be better covered by an integration test that exercises createDomainGateway end-to-end.
|
Thanks @JohnPhamous much appreciated |
Cache-Control: public, max-age=86400to bot/short-UA 403s. Vercel CDN does not cache 403, but Cloudflare and browsers do.server/cors.ts(mirrors existingapi/_cors.jsprecedent).Expected impact
Cost
Performance