chore(registry): stabilize dataset contract foundation and bootstrap parity#10
chore(registry): stabilize dataset contract foundation and bootstrap parity#10
Conversation
Add the dataset contract source, deterministic generator, and bootstrap-only generated artifacts needed to replace the hand-maintained bootstrap registries without touching api/health.js yet. Validation: npm run registry:check; node --test tests/bootstrap.test.mjs tests/market-breadth.test.mjs; node --test tests/edge-functions.test.mjs; npm run typecheck; npm run typecheck:api
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR establishes
Confidence Score: 4/5Safe to merge; the regex bug disables a validation guard but does not affect the generated output or any runtime path. The architectural refactor is solid and the generated artifacts are verified correct and complete. The one concrete bug (\d regex in extractVersionTag) disables the version-mismatch validator without impacting the currently emitted registry or any runtime behavior — it's a latent defect that should be fixed before new datasets with version tags are added. Everything else is non-blocking style. registry/datasets.ts (extractVersionTag regex); scripts/generate-dataset-registry.ts (duplicate formatStringMap/formatTierMap) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["registry/datasets.ts\n(BOOTSTRAP_ALIASES + BOOTSTRAP_TIERS)"]
B["scripts/generate-dataset-registry.ts\n(validate + emit)"]
C["api/_generated/dataset-registry.js\n(BOOTSTRAP_CACHE_KEYS, BOOTSTRAP_TIERS)"]
D["server/_shared/_generated/bootstrap-registry.ts\n(BOOTSTRAP_CACHE_KEYS, BOOTSTRAP_TIERS)"]
E["api/bootstrap.js\n(edge function)"]
F["server/_shared/cache-keys.ts\n(re-export)"]
G["scripts/check-dataset-registry.mjs\n(git diff --exit-code)"]
A -->|"npm run registry:generate"| B
B -->|"writeFileSync"| C
B -->|"writeFileSync"| D
C -->|"import"| E
D -->|"re-export"| F
A -->|"npm run registry:check"| G
G -->|"re-runs generator then"| B
G -->|"verifies no diff"| C
G -->|"verifies no diff"| D
Prompt To Fix All With AIThis is a comment left during a code review.
Path: registry/datasets.ts
Line: 224
Comment:
**Regex uses `\\d` (literal backslash-d) instead of `\d` (digit)**
In a JavaScript/TypeScript regex literal, `\\d` means a literal backslash followed by the character `d`, not a digit. This means `extractVersionTag` will always return `undefined` for every Redis key (e.g. `market:sectors:v2`, `forecast:predictions:v2`), since none of them contain the literal string `\d`.
The practical consequence is that `redis.versionTag` is `undefined` for all datasets built by `buildDatasets()`, which silently bypasses the validation guard in `generate-dataset-registry.ts`:
```typescript
if (dataset.redis.versionTag && !hasMatchingVersionTag(dataset.redis.key, dataset.redis.versionTag)) {
fail(`Dataset ${dataset.id} has key/version mismatch`);
}
```
Since `versionTag` is never set, this check is never reached. The generated registry output is unaffected today (because `versionTag` isn't emitted), but the guard against key/version mismatches in future dataset additions is completely inert.
```suggestion
const match = key.match(/(?:^|[:\\-])(v\d+)(?:$|[:])/);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: scripts/generate-dataset-registry.ts
Line: 85-103
Comment:
**`formatStringMap` and `formatTierMap` are identical**
Both functions have the exact same implementation. One of them (or both, renamed to `formatMap`) can be removed.
```suggestion
function formatMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}
return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}
```
Then use `formatMap` in both `writeFileSync` calls.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: tests/bootstrap.test.mjs
Line: 22-41
Comment:
**Test name and variable name reference the old `cache-keys.ts` location**
After this PR, `cacheKeysSrc` (line 12) is loaded from `server/_shared/_generated/bootstrap-registry.ts` — not from `server/_shared/cache-keys.ts`. The test description "generated keys match server/_shared/cache-keys.ts exports" is therefore misleading; it's actually comparing two generated files (`_generated/bootstrap-registry.ts` vs `api/_generated/dataset-registry.js`) to verify they're in sync.
While the test logic is correct, the test description and the `canonical` / `cacheKeysSrc` variable names will confuse future readers. Consider renaming to:
- `cacheKeysSrc` → `bootstrapRegistryTsSrc`
- Test name → "generated TS and JS registry files are in parity"
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: scripts/check-dataset-registry.mjs
Line: 5-12
Comment:
**`git diff --exit-code` output is piped away, making check failures harder to diagnose**
With `stdio: 'pipe'`, the diff between the stale and regenerated artifacts is silently swallowed. The developer only sees the custom error message and has to run `git diff` manually to find out what changed.
Switching to `stdio: 'inherit'` prints the diff before the process exits, making it immediately obvious what drifted:
```suggestion
execSync(
'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
{ stdio: 'inherit' },
);
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "chore(registry): introduce generated boo..." | Re-trigger Greptile |
| } | ||
|
|
||
| function extractVersionTag(key: string): `v${number}` | undefined { | ||
| const match = key.match(/(?:^|[:\\-])(v\\d+)(?:$|[:])/); |
There was a problem hiding this comment.
Regex uses
\\d (literal backslash-d) instead of \d (digit)
In a JavaScript/TypeScript regex literal, \\d means a literal backslash followed by the character d, not a digit. This means extractVersionTag will always return undefined for every Redis key (e.g. market:sectors:v2, forecast:predictions:v2), since none of them contain the literal string \d.
The practical consequence is that redis.versionTag is undefined for all datasets built by buildDatasets(), which silently bypasses the validation guard in generate-dataset-registry.ts:
if (dataset.redis.versionTag && !hasMatchingVersionTag(dataset.redis.key, dataset.redis.versionTag)) {
fail(`Dataset ${dataset.id} has key/version mismatch`);
}Since versionTag is never set, this check is never reached. The generated registry output is unaffected today (because versionTag isn't emitted), but the guard against key/version mismatches in future dataset additions is completely inert.
| const match = key.match(/(?:^|[:\\-])(v\\d+)(?:$|[:])/); | |
| const match = key.match(/(?:^|[:\\-])(v\d+)(?:$|[:])/); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: registry/datasets.ts
Line: 224
Comment:
**Regex uses `\\d` (literal backslash-d) instead of `\d` (digit)**
In a JavaScript/TypeScript regex literal, `\\d` means a literal backslash followed by the character `d`, not a digit. This means `extractVersionTag` will always return `undefined` for every Redis key (e.g. `market:sectors:v2`, `forecast:predictions:v2`), since none of them contain the literal string `\d`.
The practical consequence is that `redis.versionTag` is `undefined` for all datasets built by `buildDatasets()`, which silently bypasses the validation guard in `generate-dataset-registry.ts`:
```typescript
if (dataset.redis.versionTag && !hasMatchingVersionTag(dataset.redis.key, dataset.redis.versionTag)) {
fail(`Dataset ${dataset.id} has key/version mismatch`);
}
```
Since `versionTag` is never set, this check is never reached. The generated registry output is unaffected today (because `versionTag` isn't emitted), but the guard against key/version mismatches in future dataset additions is completely inert.
```suggestion
const match = key.match(/(?:^|[:\\-])(v\d+)(?:$|[:])/);
```
How can I resolve this? If you propose a fix, please make it concise.| function formatStringMap(entries: Array<[string, string]>): string { | ||
| if (entries.length === 0) { | ||
| return '{}'; | ||
| } | ||
|
|
||
| return `{\n${entries | ||
| .map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`) | ||
| .join('\n')}\n}`; | ||
| } | ||
|
|
||
| function formatTierMap(entries: Array<[string, string]>): string { | ||
| if (entries.length === 0) { | ||
| return '{}'; | ||
| } | ||
|
|
||
| return `{\n${entries | ||
| .map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`) | ||
| .join('\n')}\n}`; | ||
| } |
There was a problem hiding this comment.
formatStringMap and formatTierMap are identical
Both functions have the exact same implementation. One of them (or both, renamed to formatMap) can be removed.
| function formatStringMap(entries: Array<[string, string]>): string { | |
| if (entries.length === 0) { | |
| return '{}'; | |
| } | |
| return `{\n${entries | |
| .map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`) | |
| .join('\n')}\n}`; | |
| } | |
| function formatTierMap(entries: Array<[string, string]>): string { | |
| if (entries.length === 0) { | |
| return '{}'; | |
| } | |
| return `{\n${entries | |
| .map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`) | |
| .join('\n')}\n}`; | |
| } | |
| function formatMap(entries: Array<[string, string]>): string { | |
| if (entries.length === 0) { | |
| return '{}'; | |
| } | |
| return `{\n${entries | |
| .map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`) | |
| .join('\n')}\n}`; | |
| } |
Then use formatMap in both writeFileSync calls.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/generate-dataset-registry.ts
Line: 85-103
Comment:
**`formatStringMap` and `formatTierMap` are identical**
Both functions have the exact same implementation. One of them (or both, renamed to `formatMap`) can be removed.
```suggestion
function formatMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}
return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}
```
Then use `formatMap` in both `writeFileSync` calls.
How can I resolve this? If you propose a fix, please make it concise.| it('generated keys match server/_shared/cache-keys.ts exports', () => { | ||
| const extractKeys = (src) => { | ||
| const block = src.match(/BOOTSTRAP_CACHE_KEYS[^=]*=\s*\{([^}]+)\}/); | ||
| if (!block) return {}; | ||
| const re = /(\w+):\s+'([a-z0-9_-]+(?::[a-z0-9_-]+)+:v\d+)'/g; | ||
| const re = /["']([^"']+)["']:\s*["']([^"']+)["']/g; | ||
| const keys = {}; | ||
| let m; | ||
| while ((m = re.exec(block[1])) !== null) keys[m[1]] = m[2]; | ||
| return keys; | ||
| }; | ||
| const canonical = extractKeys(cacheKeysSrc); | ||
| const inlined = extractKeys(bootstrapSrc); | ||
| const generated = extractKeys(generatedRegistrySrc); | ||
| assert.ok(Object.keys(canonical).length >= 10, 'Canonical registry too small'); | ||
| for (const [name, key] of Object.entries(canonical)) { | ||
| assert.equal(inlined[name], key, `Key '${name}' mismatch: canonical='${key}', inlined='${inlined[name]}'`); | ||
| assert.equal(generated[name], key, `Key '${name}' mismatch: canonical='${key}', generated='${generated[name]}'`); | ||
| } | ||
| for (const [name, key] of Object.entries(inlined)) { | ||
| for (const [name, key] of Object.entries(generated)) { | ||
| assert.equal(canonical[name], key, `Extra inlined key '${name}' not in canonical registry`); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Test name and variable name reference the old
cache-keys.ts location
After this PR, cacheKeysSrc (line 12) is loaded from server/_shared/_generated/bootstrap-registry.ts — not from server/_shared/cache-keys.ts. The test description "generated keys match server/_shared/cache-keys.ts exports" is therefore misleading; it's actually comparing two generated files (_generated/bootstrap-registry.ts vs api/_generated/dataset-registry.js) to verify they're in sync.
While the test logic is correct, the test description and the canonical / cacheKeysSrc variable names will confuse future readers. Consider renaming to:
cacheKeysSrc→bootstrapRegistryTsSrc- Test name → "generated TS and JS registry files are in parity"
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/bootstrap.test.mjs
Line: 22-41
Comment:
**Test name and variable name reference the old `cache-keys.ts` location**
After this PR, `cacheKeysSrc` (line 12) is loaded from `server/_shared/_generated/bootstrap-registry.ts` — not from `server/_shared/cache-keys.ts`. The test description "generated keys match server/_shared/cache-keys.ts exports" is therefore misleading; it's actually comparing two generated files (`_generated/bootstrap-registry.ts` vs `api/_generated/dataset-registry.js`) to verify they're in sync.
While the test logic is correct, the test description and the `canonical` / `cacheKeysSrc` variable names will confuse future readers. Consider renaming to:
- `cacheKeysSrc` → `bootstrapRegistryTsSrc`
- Test name → "generated TS and JS registry files are in parity"
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| try { | ||
| execSync( | ||
| 'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts', | ||
| { stdio: 'pipe' }, | ||
| ); | ||
| } catch { | ||
| console.error('[dataset-registry] generated artifacts are out of date. Run: npm run registry:generate'); | ||
| process.exit(1); |
There was a problem hiding this comment.
git diff --exit-code output is piped away, making check failures harder to diagnose
With stdio: 'pipe', the diff between the stale and regenerated artifacts is silently swallowed. The developer only sees the custom error message and has to run git diff manually to find out what changed.
Switching to stdio: 'inherit' prints the diff before the process exits, making it immediately obvious what drifted:
| try { | |
| execSync( | |
| 'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts', | |
| { stdio: 'pipe' }, | |
| ); | |
| } catch { | |
| console.error('[dataset-registry] generated artifacts are out of date. Run: npm run registry:generate'); | |
| process.exit(1); | |
| execSync( | |
| 'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts', | |
| { stdio: 'inherit' }, | |
| ); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check-dataset-registry.mjs
Line: 5-12
Comment:
**`git diff --exit-code` output is piped away, making check failures harder to diagnose**
With `stdio: 'pipe'`, the diff between the stale and regenerated artifacts is silently swallowed. The developer only sees the custom error message and has to run `git diff` manually to find out what changed.
Switching to `stdio: 'inherit'` prints the diff before the process exits, making it immediately obvious what drifted:
```suggestion
execSync(
'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
{ stdio: 'inherit' },
);
```
How can I resolve this? If you propose a fix, please make it concise.|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Correct the dataset version-tag regex so generator validation is active again. Also collapse duplicate generator formatting helpers, surface registry diffs in registry:check, and clarify the bootstrap parity test wording.
|
PR author is not in the allowed authors list. |
Summary
This introduces the fork-first dataset contract foundation for bootstrap parity. The bootstrap alias map and tier map now come from generated artifacts derived from
registry/datasets.ts, whileapi/health.jsremains unchanged for the follow-up slice.Root cause
Bootstrap cache registration was duplicated across
api/bootstrap.js,server/_shared/cache-keys.ts, and tests. That duplication already drifted once when the generated registry rollout dropped the fourconsumerPrices*aliases from the pre-registry bootstrap set.Changes
registry/datasets.tsas the authored bootstrap contract sourceregistry:generateandregistry:checkapi/_generated/dataset-registry.jsandserver/_shared/_generated/bootstrap-registry.tsapi/bootstrap.jsandserver/_shared/cache-keys.tsto the generated bootstrap registryconsumerPricesOverview,consumerPricesCategories,consumerPricesMovers, andconsumerPricesSpreadbootstrap aliasesValidation
npm run registry:checknode --test tests/bootstrap.test.mjs tests/market-breadth.test.mjsnode --test tests/edge-functions.test.mjsnpm run typechecknpm run typecheck:apiRisk
Low. This is a compile-time and test-time refactor for bootstrap registration only;
api/health.jsstill uses the existing inline health registry in this PR.Refs #5