Skip to content

chore(registry): stabilize dataset contract foundation and bootstrap parity#10

Merged
lspassos1 merged 2 commits intomainfrom
chore/registry-bootstrap-parity
Apr 11, 2026
Merged

chore(registry): stabilize dataset contract foundation and bootstrap parity#10
lspassos1 merged 2 commits intomainfrom
chore/registry-bootstrap-parity

Conversation

@lspassos1
Copy link
Copy Markdown
Owner

Summary

This introduces the fork-first dataset contract foundation for bootstrap parity. The bootstrap alias map and tier map now come from generated artifacts derived from registry/datasets.ts, while api/health.js remains unchanged for the follow-up slice.

Root cause

Bootstrap cache registration was duplicated across api/bootstrap.js, server/_shared/cache-keys.ts, and tests. That duplication already drifted once when the generated registry rollout dropped the four consumerPrices* aliases from the pre-registry bootstrap set.

Changes

  • add registry/datasets.ts as the authored bootstrap contract source
  • add a deterministic generator plus registry:generate and registry:check
  • generate api/_generated/dataset-registry.js and server/_shared/_generated/bootstrap-registry.ts
  • switch api/bootstrap.js and server/_shared/cache-keys.ts to the generated bootstrap registry
  • restore the missing consumerPricesOverview, consumerPricesCategories, consumerPricesMovers, and consumerPricesSpread bootstrap aliases
  • update bootstrap-focused tests to assert generated parity instead of inline registries

Validation

  • npm run registry:check
  • node --test tests/bootstrap.test.mjs tests/market-breadth.test.mjs
  • node --test tests/edge-functions.test.mjs
  • npm run typecheck
  • npm run typecheck:api

Risk

Low. This is a compile-time and test-time refactor for bootstrap registration only; api/health.js still uses the existing inline health registry in this PR.

Refs #5

Add the dataset contract source, deterministic generator, and bootstrap-only generated artifacts needed to replace the hand-maintained bootstrap registries without touching api/health.js yet.

Validation: npm run registry:check; node --test tests/bootstrap.test.mjs tests/market-breadth.test.mjs; node --test tests/edge-functions.test.mjs; npm run typecheck; npm run typecheck:api
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Apr 11, 2026 9:33pm

@lspassos1
Copy link
Copy Markdown
Owner Author

@greptileai

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR establishes registry/datasets.ts as the single source of truth for the bootstrap dataset contract, replacing three previously-drifted inline copies (api/bootstrap.js, server/_shared/cache-keys.ts, and tests). A generator script (scripts/generate-dataset-registry.ts) deterministically produces api/_generated/dataset-registry.js and server/_shared/_generated/bootstrap-registry.ts, and a CI check (registry:check) enforces that committed artifacts stay in sync. The four consumerPrices* bootstrap aliases that were dropped in a prior refactor are restored.

  • Core architecture: registry/datasets.tsgenerate-dataset-registry.ts → two generated artifacts consumed by api/bootstrap.js and server/_shared/cache-keys.ts; clean separation between the authored contract and the emitted artifacts
  • Tier logic simplified: SLOW_KEYS / FAST_KEYS Set membership replaced with a direct BOOTSTRAP_TIERS[alias] === tier map lookup in api/bootstrap.js
  • extractVersionTag regex bug: the regex uses \\\\d (literal backslash-d) instead of \\d (digit), so the function always returns undefined — the key/version mismatch validation guard in the generator is silently disabled for all 88 datasets
  • Identical generator helpers: formatStringMap and formatTierMap in generate-dataset-registry.ts have identical bodies and can be merged
  • Removed health test: The 'Health key registries' describe block (which verified no Redis-key overlap between BOOTSTRAP_KEYS and STANDALONE_KEYS in health.js) was dropped without a replacement
  • Misleading test variable: In bootstrap.test.mjs, cacheKeysSrc now points to the generated _generated/bootstrap-registry.ts but retains its old name and the test description still references cache-keys.ts

Confidence Score: 4/5

Safe to merge; the regex bug disables a validation guard but does not affect the generated output or any runtime path.

The architectural refactor is solid and the generated artifacts are verified correct and complete. The one concrete bug (\d regex in extractVersionTag) disables the version-mismatch validator without impacting the currently emitted registry or any runtime behavior — it's a latent defect that should be fixed before new datasets with version tags are added. Everything else is non-blocking style.

registry/datasets.ts (extractVersionTag regex); scripts/generate-dataset-registry.ts (duplicate formatStringMap/formatTierMap)

Important Files Changed

Filename Overview
registry/datasets.ts New source-of-truth contract file; contains a regex bug in extractVersionTag that uses \d (literal backslash-d) instead of \d (digit), silently disabling the version-mismatch validation guard.
scripts/generate-dataset-registry.ts Generator script is well-structured with good validation logic; formatStringMap and formatTierMap are identical and could be merged.
scripts/check-dataset-registry.mjs CI check script is correct; uses stdio: 'pipe' for the git diff which swallows diff output and makes failures harder to diagnose.
api/_generated/dataset-registry.js Generated JS artifact for the edge function; correctly exposes BOOTSTRAP_CACHE_KEYS and BOOTSTRAP_TIERS, includes the four previously-missing consumerPrices* aliases.
server/_shared/_generated/bootstrap-registry.ts Generated TS artifact for server-side consumers; mirrors dataset-registry.js content with proper TypeScript type annotations.
api/bootstrap.js Cleanly swaps the inline BOOTSTRAP_CACHE_KEYS / SLOW_KEYS / FAST_KEYS definitions for a generated import; tier filtering logic simplified from Set membership to a direct map lookup.
server/_shared/cache-keys.ts Replaces ~150 lines of inline BOOTSTRAP_CACHE_KEYS / BOOTSTRAP_TIERS with a one-line re-export from the generated registry; correct and clean.
tests/bootstrap.test.mjs Tests updated to validate generated artifacts; variable name cacheKeysSrc and test title still reference cache-keys.ts but now point to the generated bootstrap-registry.ts; health.js duplicate-key test was removed without replacement.
tests/market-breadth.test.mjs Tests adapted to check generated registry files for breadthHistory presence; assertions are correct.
tests/supply-chain-v2.test.mjs Tests updated to verify chokepoints/minerals version tags in generated registry; logic is correct and equivalent to previous assertions.
package.json Adds registry:generate and registry:check npm scripts; straightforward.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["registry/datasets.ts\n(BOOTSTRAP_ALIASES + BOOTSTRAP_TIERS)"]
    B["scripts/generate-dataset-registry.ts\n(validate + emit)"]
    C["api/_generated/dataset-registry.js\n(BOOTSTRAP_CACHE_KEYS, BOOTSTRAP_TIERS)"]
    D["server/_shared/_generated/bootstrap-registry.ts\n(BOOTSTRAP_CACHE_KEYS, BOOTSTRAP_TIERS)"]
    E["api/bootstrap.js\n(edge function)"]
    F["server/_shared/cache-keys.ts\n(re-export)"]
    G["scripts/check-dataset-registry.mjs\n(git diff --exit-code)"]

    A -->|"npm run registry:generate"| B
    B -->|"writeFileSync"| C
    B -->|"writeFileSync"| D
    C -->|"import"| E
    D -->|"re-export"| F
    A -->|"npm run registry:check"| G
    G -->|"re-runs generator then"| B
    G -->|"verifies no diff"| C
    G -->|"verifies no diff"| D
Loading

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: registry/datasets.ts
Line: 224

Comment:
**Regex uses `\\d` (literal backslash-d) instead of `\d` (digit)**

In a JavaScript/TypeScript regex literal, `\\d` means a literal backslash followed by the character `d`, not a digit. This means `extractVersionTag` will always return `undefined` for every Redis key (e.g. `market:sectors:v2`, `forecast:predictions:v2`), since none of them contain the literal string `\d`.

The practical consequence is that `redis.versionTag` is `undefined` for all datasets built by `buildDatasets()`, which silently bypasses the validation guard in `generate-dataset-registry.ts`:

```typescript
if (dataset.redis.versionTag && !hasMatchingVersionTag(dataset.redis.key, dataset.redis.versionTag)) {
  fail(`Dataset ${dataset.id} has key/version mismatch`);
}
```

Since `versionTag` is never set, this check is never reached. The generated registry output is unaffected today (because `versionTag` isn't emitted), but the guard against key/version mismatches in future dataset additions is completely inert.

```suggestion
  const match = key.match(/(?:^|[:\\-])(v\d+)(?:$|[:])/);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/generate-dataset-registry.ts
Line: 85-103

Comment:
**`formatStringMap` and `formatTierMap` are identical**

Both functions have the exact same implementation. One of them (or both, renamed to `formatMap`) can be removed.

```suggestion
function formatMap(entries: Array<[string, string]>): string {
  if (entries.length === 0) {
    return '{}';
  }

  return `{\n${entries
    .map(([name, value]) => `  ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
    .join('\n')}\n}`;
}
```

Then use `formatMap` in both `writeFileSync` calls.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/bootstrap.test.mjs
Line: 22-41

Comment:
**Test name and variable name reference the old `cache-keys.ts` location**

After this PR, `cacheKeysSrc` (line 12) is loaded from `server/_shared/_generated/bootstrap-registry.ts` — not from `server/_shared/cache-keys.ts`. The test description "generated keys match server/_shared/cache-keys.ts exports" is therefore misleading; it's actually comparing two generated files (`_generated/bootstrap-registry.ts` vs `api/_generated/dataset-registry.js`) to verify they're in sync.

While the test logic is correct, the test description and the `canonical` / `cacheKeysSrc` variable names will confuse future readers. Consider renaming to:
- `cacheKeysSrc``bootstrapRegistryTsSrc`
- Test name → "generated TS and JS registry files are in parity"

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/check-dataset-registry.mjs
Line: 5-12

Comment:
**`git diff --exit-code` output is piped away, making check failures harder to diagnose**

With `stdio: 'pipe'`, the diff between the stale and regenerated artifacts is silently swallowed. The developer only sees the custom error message and has to run `git diff` manually to find out what changed.

Switching to `stdio: 'inherit'` prints the diff before the process exits, making it immediately obvious what drifted:

```suggestion
  execSync(
    'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
    { stdio: 'inherit' },
  );
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "chore(registry): introduce generated boo..." | Re-trigger Greptile

Comment thread registry/datasets.ts Outdated
}

function extractVersionTag(key: string): `v${number}` | undefined {
const match = key.match(/(?:^|[:\\-])(v\\d+)(?:$|[:])/);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Regex uses \\d (literal backslash-d) instead of \d (digit)

In a JavaScript/TypeScript regex literal, \\d means a literal backslash followed by the character d, not a digit. This means extractVersionTag will always return undefined for every Redis key (e.g. market:sectors:v2, forecast:predictions:v2), since none of them contain the literal string \d.

The practical consequence is that redis.versionTag is undefined for all datasets built by buildDatasets(), which silently bypasses the validation guard in generate-dataset-registry.ts:

if (dataset.redis.versionTag && !hasMatchingVersionTag(dataset.redis.key, dataset.redis.versionTag)) {
  fail(`Dataset ${dataset.id} has key/version mismatch`);
}

Since versionTag is never set, this check is never reached. The generated registry output is unaffected today (because versionTag isn't emitted), but the guard against key/version mismatches in future dataset additions is completely inert.

Suggested change
const match = key.match(/(?:^|[:\\-])(v\\d+)(?:$|[:])/);
const match = key.match(/(?:^|[:\\-])(v\d+)(?:$|[:])/);
Prompt To Fix With AI
This is a comment left during a code review.
Path: registry/datasets.ts
Line: 224

Comment:
**Regex uses `\\d` (literal backslash-d) instead of `\d` (digit)**

In a JavaScript/TypeScript regex literal, `\\d` means a literal backslash followed by the character `d`, not a digit. This means `extractVersionTag` will always return `undefined` for every Redis key (e.g. `market:sectors:v2`, `forecast:predictions:v2`), since none of them contain the literal string `\d`.

The practical consequence is that `redis.versionTag` is `undefined` for all datasets built by `buildDatasets()`, which silently bypasses the validation guard in `generate-dataset-registry.ts`:

```typescript
if (dataset.redis.versionTag && !hasMatchingVersionTag(dataset.redis.key, dataset.redis.versionTag)) {
  fail(`Dataset ${dataset.id} has key/version mismatch`);
}
```

Since `versionTag` is never set, this check is never reached. The generated registry output is unaffected today (because `versionTag` isn't emitted), but the guard against key/version mismatches in future dataset additions is completely inert.

```suggestion
  const match = key.match(/(?:^|[:\\-])(v\d+)(?:$|[:])/);
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment thread scripts/generate-dataset-registry.ts Outdated
Comment on lines +85 to +103
function formatStringMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}

return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}

function formatTierMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}

return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 formatStringMap and formatTierMap are identical

Both functions have the exact same implementation. One of them (or both, renamed to formatMap) can be removed.

Suggested change
function formatStringMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}
return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}
function formatTierMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}
return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}
function formatMap(entries: Array<[string, string]>): string {
if (entries.length === 0) {
return '{}';
}
return `{\n${entries
.map(([name, value]) => ` ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
.join('\n')}\n}`;
}

Then use formatMap in both writeFileSync calls.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/generate-dataset-registry.ts
Line: 85-103

Comment:
**`formatStringMap` and `formatTierMap` are identical**

Both functions have the exact same implementation. One of them (or both, renamed to `formatMap`) can be removed.

```suggestion
function formatMap(entries: Array<[string, string]>): string {
  if (entries.length === 0) {
    return '{}';
  }

  return `{\n${entries
    .map(([name, value]) => `  ${JSON.stringify(name)}: ${JSON.stringify(value)},`)
    .join('\n')}\n}`;
}
```

Then use `formatMap` in both `writeFileSync` calls.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment thread tests/bootstrap.test.mjs Outdated
Comment on lines 22 to 41
it('generated keys match server/_shared/cache-keys.ts exports', () => {
const extractKeys = (src) => {
const block = src.match(/BOOTSTRAP_CACHE_KEYS[^=]*=\s*\{([^}]+)\}/);
if (!block) return {};
const re = /(\w+):\s+'([a-z0-9_-]+(?::[a-z0-9_-]+)+:v\d+)'/g;
const re = /["']([^"']+)["']:\s*["']([^"']+)["']/g;
const keys = {};
let m;
while ((m = re.exec(block[1])) !== null) keys[m[1]] = m[2];
return keys;
};
const canonical = extractKeys(cacheKeysSrc);
const inlined = extractKeys(bootstrapSrc);
const generated = extractKeys(generatedRegistrySrc);
assert.ok(Object.keys(canonical).length >= 10, 'Canonical registry too small');
for (const [name, key] of Object.entries(canonical)) {
assert.equal(inlined[name], key, `Key '${name}' mismatch: canonical='${key}', inlined='${inlined[name]}'`);
assert.equal(generated[name], key, `Key '${name}' mismatch: canonical='${key}', generated='${generated[name]}'`);
}
for (const [name, key] of Object.entries(inlined)) {
for (const [name, key] of Object.entries(generated)) {
assert.equal(canonical[name], key, `Extra inlined key '${name}' not in canonical registry`);
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test name and variable name reference the old cache-keys.ts location

After this PR, cacheKeysSrc (line 12) is loaded from server/_shared/_generated/bootstrap-registry.ts — not from server/_shared/cache-keys.ts. The test description "generated keys match server/_shared/cache-keys.ts exports" is therefore misleading; it's actually comparing two generated files (_generated/bootstrap-registry.ts vs api/_generated/dataset-registry.js) to verify they're in sync.

While the test logic is correct, the test description and the canonical / cacheKeysSrc variable names will confuse future readers. Consider renaming to:

  • cacheKeysSrcbootstrapRegistryTsSrc
  • Test name → "generated TS and JS registry files are in parity"
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/bootstrap.test.mjs
Line: 22-41

Comment:
**Test name and variable name reference the old `cache-keys.ts` location**

After this PR, `cacheKeysSrc` (line 12) is loaded from `server/_shared/_generated/bootstrap-registry.ts` — not from `server/_shared/cache-keys.ts`. The test description "generated keys match server/_shared/cache-keys.ts exports" is therefore misleading; it's actually comparing two generated files (`_generated/bootstrap-registry.ts` vs `api/_generated/dataset-registry.js`) to verify they're in sync.

While the test logic is correct, the test description and the `canonical` / `cacheKeysSrc` variable names will confuse future readers. Consider renaming to:
- `cacheKeysSrc``bootstrapRegistryTsSrc`
- Test name → "generated TS and JS registry files are in parity"

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

Comment on lines +5 to +12
try {
execSync(
'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
{ stdio: 'pipe' },
);
} catch {
console.error('[dataset-registry] generated artifacts are out of date. Run: npm run registry:generate');
process.exit(1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 git diff --exit-code output is piped away, making check failures harder to diagnose

With stdio: 'pipe', the diff between the stale and regenerated artifacts is silently swallowed. The developer only sees the custom error message and has to run git diff manually to find out what changed.

Switching to stdio: 'inherit' prints the diff before the process exits, making it immediately obvious what drifted:

Suggested change
try {
execSync(
'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
{ stdio: 'pipe' },
);
} catch {
console.error('[dataset-registry] generated artifacts are out of date. Run: npm run registry:generate');
process.exit(1);
execSync(
'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
{ stdio: 'inherit' },
);
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check-dataset-registry.mjs
Line: 5-12

Comment:
**`git diff --exit-code` output is piped away, making check failures harder to diagnose**

With `stdio: 'pipe'`, the diff between the stale and regenerated artifacts is silently swallowed. The developer only sees the custom error message and has to run `git diff` manually to find out what changed.

Switching to `stdio: 'inherit'` prints the diff before the process exits, making it immediately obvious what drifted:

```suggestion
  execSync(
    'git diff --exit-code -- api/_generated/dataset-registry.js server/_shared/_generated/bootstrap-registry.ts',
    { stdio: 'inherit' },
  );
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

@lspassos1
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Correct the dataset version-tag regex so generator validation is active again.
Also collapse duplicate generator formatting helpers, surface registry diffs in registry:check, and clarify the bootstrap parity test wording.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 11, 2026

PR author is not in the allowed authors list.

@lspassos1 lspassos1 marked this pull request as ready for review April 11, 2026 21:31
@lspassos1 lspassos1 merged commit 9505af6 into main Apr 11, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant