Skip to content

feat(regulatory): add regulatory RSS fetch seeder#2564

Open
lspassos1 wants to merge 2 commits intokoala73:mainfrom
lspassos1:feat/regulatory-rss-fetch-parse
Open

feat(regulatory): add regulatory RSS fetch seeder#2564
lspassos1 wants to merge 2 commits intokoala73:mainfrom
lspassos1:feat/regulatory-rss-fetch-parse

Conversation

@lspassos1
Copy link
Copy Markdown
Collaborator

Summary

This adds the first regulatory RSS pipeline step for seed-regulatory-actions.mjs: fetch live SEC, CFTC, Federal Reserve, FDIC, and FINRA feeds concurrently, parse RSS/Atom natively, normalize the output, and emit JSON for the fetch/parse-only phase.

Root cause

The repository did not have a regulatory-actions seeder yet, and several URLs from the initial issue description are no longer the live official feed endpoints. Without a dedicated fetch/parse layer, the rest of the regulatory pipeline cannot be built on stable input.

Changes

  • add scripts/seed-regulatory-actions.mjs as an import-safe standalone seeder with concurrent fetch, partial-failure tolerance, native RSS/Atom parsing, deterministic IDs, deduplication, and sorted normalized output
  • use current working official feed endpoints for SEC, CFTC, Federal Reserve, FDIC, and FINRA
  • use a declared WorldMonitor User-Agent only for SEC because the current SEC endpoint rejects generic browser spoofing
  • add tests/regulatory-seed-unit.test.mjs covering RSS/Atom parsing, href extraction, HTML cleanup, deduplication, ordering, partial failure, and all-feeds-fail behavior

Validation

  • node --test tests/regulatory-seed-unit.test.mjs
  • node scripts/seed-regulatory-actions.mjs | head -n 40
  • node -e "import('./scripts/seed-regulatory-actions.mjs').then(() => process.stdout.write('import-ok\\n'))"

Risk

Low risk. This PR only adds a new standalone script and a focused unit test; it does not write to Redis or change runtime application behavior yet.

Closes #2492
Refs #2493
Refs #2494
Refs #2495

Add a standalone seeder that fetches and normalizes SEC, CFTC, Federal Reserve, FDIC, and FINRA regulatory feeds without introducing new dependencies.

The script stays import-safe, tolerates partial feed failure, and emits JSON for the fetch/parse-only phase of the pipeline. Unit tests cover RSS/Atom parsing, deduplication, ordering, and degraded-feed behavior.

Refs koala73#2492
Refs koala73#2493
Refs koala73#2494
Refs koala73#2495
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 30, 2026

@lspassos1 is attempting to deploy a commit to the Elie Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR adds the first step of the regulatory-actions pipeline: scripts/seed-regulatory-actions.mjs fetches live RSS/Atom feeds from SEC, CFTC, Federal Reserve, FDIC, and FINRA concurrently, normalises entries into a common schema with deterministic IDs, deduplicates by canonical URL, and emits sorted JSON to stdout. A companion unit-test file covers parsing, deduplication, partial-failure tolerance, and the all-feeds-fail error path. The script is self-contained, import-safe, and explicitly does not yet write to Redis — making it a low-risk staging step before the full pipeline is wired in follow-up PRs.

Key findings:

  • FINRA feed uses http:// (http://feeds.finra.org/FINRANotices) while all other agency URLs use HTTPS. In a regulatory-data pipeline an unencrypted fetch opens the door to MITM injection of false compliance entries; this should be https:// before the script is deployed.
  • globalThis.fetch default parameters on fetchFeed, fetchAllFeeds, and main violate the explicit AGENTS.md convention — project rules require (...args) => globalThis.fetch(...args) to avoid the bound-reference anti-pattern.
  • Per AGENTS.md, once this script is wired to write to Redis the following must be added: a seed-meta:<key> freshness metadata write for health monitoring, and bootstrap hydration in api/bootstrap.js. The PR description correctly defers both to the next phase.
  • The vm.runInContext test strategy is functional and covers the key pure-function paths well; process is intentionally absent from the VM context, so main() cannot be exercised through the existing test harness without extending the context.

Confidence Score: 4/5

  • Safe to merge after fixing the FINRA HTTP URL; the globalThis.fetch convention violation should also be addressed but is non-blocking in isolation.
  • One P1 finding (FINRA plain-HTTP fetch) should be resolved before the script is deployed — it is a present security defect on the changed code path. The P2 convention violation is real but non-blocking. All other aspects of the implementation (concurrent fetch, partial-failure tolerance, deduplication, deterministic IDs, test coverage) are solid.
  • scripts/seed-regulatory-actions.mjs — FINRA HTTP URL (line 15) and globalThis.fetch default params (lines 163, 169, 205)

Important Files Changed

Filename Overview
scripts/seed-regulatory-actions.mjs New standalone seeder: concurrent fetch + native RSS/Atom regex parsing for SEC, CFTC, Fed, FDIC, FINRA. Two issues: FINRA URL uses plain HTTP (P1 security) and globalThis.fetch default params violate the project fetch convention from AGENTS.md (P2).
tests/regulatory-seed-unit.test.mjs Focused unit tests using vm.runInContext to isolate pure functions. Covers RSS/Atom parsing, entity decoding, deduplication, sort order, partial-failure tolerance, and all-feeds-fail path. The vm + regex-strip approach is unusual but functional; process is absent from the context, so main() cannot be tested without expanding the context object.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / Importer
    participant main
    participant fetchAllFeeds
    participant fetchFeed
    participant Agency as Agency Feed (SEC/CFTC/Fed/FDIC/FINRA)

    CLI->>main: node seed-regulatory-actions.mjs
    main->>fetchAllFeeds: fetchAllFeeds(globalThis.fetch)
    fetchAllFeeds->>fetchAllFeeds: Promise.allSettled(feeds.map(fetchFeed))

    par Concurrent fetch
        fetchAllFeeds->>fetchFeed: fetchFeed(SEC, fetch)
        fetchFeed->>Agency: GET pressreleases.rss (WorldMonitor UA)
        Agency-->>fetchFeed: RSS XML
        fetchFeed->>fetchFeed: parseFeed → normalizeFeedItems
        fetchFeed-->>fetchAllFeeds: RegulatoryAction[]
    and
        fetchAllFeeds->>fetchFeed: fetchFeed(CFTC, fetch)
        fetchFeed->>Agency: GET rssenf.xml (Chrome UA)
        Agency-->>fetchFeed: RSS XML
        fetchFeed-->>fetchAllFeeds: RegulatoryAction[]
    and
        fetchAllFeeds->>fetchFeed: fetchFeed(FINRA, fetch)
        fetchFeed->>Agency: GET FINRANotices (http⚠️)
        Agency-->>fetchFeed: RSS XML
        fetchFeed-->>fetchAllFeeds: RegulatoryAction[]
    end

    fetchAllFeeds->>fetchAllFeeds: dedupeAndSortActions (by URL, newest first)
    alt successCount === 0
        fetchAllFeeds-->>main: throw "All regulatory feeds failed"
        main-->>CLI: process.exit(1)
    else at least one succeeded
        fetchAllFeeds-->>main: RegulatoryAction[] (sorted)
        main-->>CLI: stdout JSON
    end
Loading

Reviews (1): Last reviewed commit: "feat(regulatory): add regulatory RSS fet..." | Re-trigger Greptile

lspassos1 added a commit to lspassos1/worldmonitor that referenced this pull request Mar 30, 2026
Build on the standalone RSS fetcher by adding keyword-based tier classification, aggregate payload counts, and runSeed integration for regulatory:actions:v1.

The updated tests cover matched keywords, payload stats, and the runSeed wiring needed for Redis publication.

Refs koala73#2493
Depends on koala73#2564
Use the repository-standard fetch wrapper in the seeder defaults, keep the documented FINRA HTTP exception in place, and include publish time in generated action ids to avoid same-day collisions.

Validated with: node --test tests/regulatory-seed-unit.test.mjs; node scripts/seed-regulatory-actions.mjs | head -n 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(regulatory): seed-regulatory-actions.mjs — RSS fetch + parse

1 participant