Skip to content

feat(climate+health):add shared air quality seed and mirrored health#2634

Merged
koala73 merged 6 commits intomainfrom
feat/add-air-quality-seeder
Apr 3, 2026
Merged

feat(climate+health):add shared air quality seed and mirrored health#2634
koala73 merged 6 commits intomainfrom
feat/add-air-quality-seeder

Conversation

@FayezBast
Copy link
Copy Markdown
Collaborator

Summary

Adds a shared air-quality ingestion path backed by OpenAQ v3 with optional WAQI supplementation. The new seed writes the same station payload to both health:air-quality:v1 and climate:air-quality:v1, adds mirrored health/climate RPCs for that data, updates generated API clients/server stubs and OpenAPI docs, and registers the new cache/bootstrap keys plus lightweight frontend hydration consumers.

NOTE :
u mentioned that no api key req for OpenAQ v3 but it req one,
Hourly execution is an external infra dependency, not repo-managed. This PR assumes an hourly cron for node scripts/seed-health-air-quality.mjs; repo config does not enforce it (see vercel.json (line 3), crons: []). Ensure Railway cron is set to run every 1 hour.

Fixes #2471

Type of change

  • Bug fix
  • New feature
  • New data source / feed
  • Documentation

Affected areas

  • API endpoints (/api/*)
  • Other: Seed scripts, Redis cache/bootstrap, generated RPC clients/server stubs

Checklist

  • No API keys or secrets committed
  • TypeScript compiles without errors (npm run typecheck)

@mintlify
Copy link
Copy Markdown

mintlify bot commented Apr 3, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
WorldMonitor 🟢 Ready View Preview Apr 3, 2026, 1:21 AM

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
worldmonitor Ignored Ignored Preview Apr 3, 2026 5:42am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR introduces a shared air-quality ingestion pipeline backed by OpenAQ v3 (with optional WAQI supplementation). A single hourly seed script writes the same station payload to both health:air-quality:v1 and climate:air-quality:v1 in Redis, wiring mirrored RPCs for both the health and climate domains, updating generated proto/client/server stubs, registering new bootstrap hydration keys, and adding frontend service consumers. The implementation correctly follows the project's gateway, caching, and proto-contract patterns, and includes solid unit test coverage for all pure functions.

Key findings from the review:

  • Misleading waqiEntries parameter in buildAirQualityPayload — the exported function accepts pre-normalized station objects, not raw WAQI API entries, but the parameter name implies the latter. A future caller passing raw WAQI data would silently produce incorrect/empty station lists.
  • Silent exit(0) on missing OPENAQ_API_KEY — a fatal configuration error (absent API key) is swallowed by the graceful-failure path and reported to Railway as a successful cron run, masking misconfiguration.
  • Lock timeout shorter than worst-case execution time — the distributed lock is acquired for 120 s, but sequential pagination of up to 20 pages per OpenAQ endpoint (each with a 30 s timeout and 2 retries) can realistically exceed this, allowing a second Railway instance to start concurrently.
  • OPENAQ_META_KEY naming inconsistency — named after the data source rather than the destination domain, unlike its sibling CLIMATE_META_KEY.

Confidence Score: 3/5

Mergeable with caution — the core data flow is correct but the seed script has observability and concurrency gaps that should be addressed before relying on it in production.

The handler, proto, generated-stub, and frontend service changes are clean and low risk. The seed script is the main concern: the silent exit(0) on a missing API key will make production misconfiguration invisible to Railway monitoring, the lock TTL is shorter than the realistic worst-case runtime allowing concurrent writes, and the misleading parameter name creates a future maintainability hazard.

scripts/seed-health-air-quality.mjs — lock timeout, silent misconfiguration exit, and parameter naming issues.

Important Files Changed

Filename Overview
scripts/seed-health-air-quality.mjs Core seed script — fetches OpenAQ v3 + optional WAQI data, merges and writes to two Redis keys; has a misleading waqiEntries parameter name, silent exit(0) on missing API key, and a lock timeout shorter than worst-case execution time.
server/_shared/air-quality-stations.ts Shared TypeScript normalizer for air quality station payloads; correctly handles camelCase/snake_case field aliasing and defensive null checking.
server/_shared/cache-keys.ts Adds CLIMATE_AIR_QUALITY_KEY and HEALTH_AIR_QUALITY_KEY constants plus corresponding bootstrap tier entries; correctly mirrors api/bootstrap.js registration.
server/worldmonitor/health/v1/list-air-quality-alerts.ts Health RPC handler reads from health:air-quality:v1; gracefully falls back to payload.alerts for forward-compatibility.
server/worldmonitor/climate/v1/list-air-quality-data.ts Climate RPC handler reads from climate:air-quality:v1; mirrors the health handler structure cleanly.
tests/air-quality-seed.test.mjs Good unit test coverage for AQI helpers, station building, and merge/dedup logic; correctly validates mirrored pipeline commands; uses OPENAQ_META_KEY name that should be updated if constant is renamed.
api/bootstrap.js Adds climateAirQuality and healthAirQuality to both BOOTSTRAP_CACHE_KEYS and FAST_KEYS; correctly duplicates keys from cache-keys.ts as required by Edge Function isolation rules.
server/gateway.ts Registers /api/climate/v1/list-air-quality-data and /api/health/v1/list-air-quality-alerts under the fast cache tier; no issues.

Sequence Diagram

sequenceDiagram
    participant Railway as Railway Cron (hourly)
    participant Seed as seed-health-air-quality.mjs
    participant Lock as Redis Lock
    participant OAQ as OpenAQ v3 API
    participant WAQI as WAQI API (optional)
    participant Redis as Redis (Upstash)
    participant Bootstrap as /api/bootstrap
    participant Health as /api/health/v1/list-air-quality-alerts
    participant Climate as /api/climate/v1/list-air-quality-data

    Railway->>Seed: node seed-health-air-quality.mjs
    Seed->>Lock: acquireLock(health:air-quality, 120s)
    alt Lock acquired
        par Parallel fetches
            Seed->>OAQ: GET /v3/locations?parameters=pm25 (paginated)
            Seed->>OAQ: GET /v3/parameters/2/latest (paginated)
            Seed->>WAQI: GET /map/bounds/ x6 tiles
        end
        Seed->>Seed: buildOpenAqStations() + buildWaqiStations()
        Seed->>Seed: mergeAirQualityStations()
        Seed->>Redis: PIPELINE SET health:air-quality:v1 EX 3600
        Seed->>Redis: PIPELINE SET climate:air-quality:v1 EX 3600
        Seed->>Redis: PIPELINE SET seed-meta:health:air-quality EX 604800
        Seed->>Redis: PIPELINE SET seed-meta:climate:air-quality EX 604800
        Seed->>Lock: releaseLock()
    else Lock skipped
        Seed->>Seed: process.exit(0)
    end

    Bootstrap->>Redis: MGET health:air-quality:v1, climate:air-quality:v1
    Bootstrap-->>Bootstrap: hydrate client

    Health->>Redis: GET health:air-quality:v1
    Health-->>Health: normalizeAirQualityStations() to AirQualityAlert[]

    Climate->>Redis: GET climate:air-quality:v1
    Climate-->>Climate: normalizeAirQualityStations() to AirQualityStation[]
Loading

Reviews (1): Last reviewed commit: "feat(climate+health):add shared air qual..." | Re-trigger Greptile

Comment on lines +373 to +380
export function buildAirQualityPayload({ locations = [], latestMeasurements = [], waqiEntries = [], nowMs = Date.now() }) {
const openAqStations = buildOpenAqStations(locations, latestMeasurements, nowMs);
const mergedStations = mergeAirQualityStations(openAqStations, waqiEntries);
return {
stations: mergedStations.map(toOutputStation),
fetchedAt: nowMs,
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Misleading waqiEntries parameter accepts pre-normalized stations, not raw API entries

The parameter is named waqiEntries, implying raw WAQI API payloads (which have fields like lat, lon, aqi as a string, station.name, iaqi.pm25.v). However, the implementation passes the value directly to mergeAirQualityStations, which expects already-normalized station objects with city, countryCode, lng, measuredAt, etc.

The actual caller — fetchAirQualityPayload — correctly passes pre-normalized objects (output of buildWaqiStations), so there is no runtime bug today. However, the exported signature is misleading:

  • The tests use waqiEntries: [], masking the discrepancy.
  • A future caller passing raw WAQI entries would silently produce wrong results: stationIdentity would read station.city (undefined on raw entries), collapsing all stations to the same key or dropping them entirely.

Rename the parameter to waqiStations to make the contract explicit:

Suggested change
export function buildAirQualityPayload({ locations = [], latestMeasurements = [], waqiEntries = [], nowMs = Date.now() }) {
const openAqStations = buildOpenAqStations(locations, latestMeasurements, nowMs);
const mergedStations = mergeAirQualityStations(openAqStations, waqiEntries);
return {
stations: mergedStations.map(toOutputStation),
fetchedAt: nowMs,
};
}
export function buildAirQualityPayload({ locations = [], latestMeasurements = [], waqiStations = [], nowMs = Date.now() }) {
const openAqStations = buildOpenAqStations(locations, latestMeasurements, nowMs);
const mergedStations = mergeAirQualityStations(openAqStations, waqiStations);

Also update the call site in fetchAirQualityPayload (line 392) to pass waqiStations instead of waqiEntries: waqiStations.

Comment on lines +479 to +492
payload = await withRetry(() => fetchAirQualityPayload(), 2, 1_000);
} catch (error) {
await releaseLock(`${domain}:${resource}`, runId);
const durationMs = Date.now() - startMs;
const cause = error?.cause ? ` (cause: ${error.cause.message || error.cause.code || error.cause})` : '';
console.error(` FETCH FAILED: ${error?.message ?? error}${cause}`);
await extendExistingTtl([
HEALTH_AIR_QUALITY_KEY,
CLIMATE_AIR_QUALITY_KEY,
OPENAQ_META_KEY,
CLIMATE_META_KEY,
], CACHE_TTL).catch(() => {});
console.log(`\n=== Failed gracefully (${Math.round(durationMs)}ms) ===`);
process.exit(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing API key causes silent exit(0) — Railway cron reports success on fatal misconfiguration

When OPENAQ_API_KEY is absent, buildOpenAqHeaders() throws immediately. This propagates through fetchPagedResultsfetchAirQualityPayload → the withRetry wrapper (2 retries, both instantly fail) → the catch block below.

The catch block releases the lock, extends existing TTL with stale data, then calls process.exit(0). Railway sees a successful exit code and reports the cron run as successful. A missing API key will therefore pass silently every hour, serve ever-more-stale data, and never surface as an alert.

Consider distinguishing configuration errors from transient network errors and exiting with a non-zero code for the former, so Railway cron failure alerts fire when the environment is misconfigured.

Comment on lines +326 to +348
async function fetchPagedResults(fetchPage, label) {
const results = [];
let expectedFound = 0;

for (let page = 1; page <= OPENAQ_MAX_PAGES; page++) {
const payload = await fetchPage(page);
const pageResults = Array.isArray(payload?.results) ? payload.results : [];
results.push(...pageResults);

const found = toFiniteNumber(payload?.meta?.found);
const effectiveLimit = toFiniteNumber(payload?.meta?.limit) ?? OPENAQ_PAGE_LIMIT;
if (found != null && found > 0) expectedFound = found;

if (pageResults.length < effectiveLimit) break;
if (expectedFound > 0 && results.length >= expectedFound) break;
}

if (results.length === 0) {
throw new Error(`${label}: no results returned`);
}

return results;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Sequential pagination with a 120 s lock could time out under API pressure

fetchPagedResults fetches pages one-by-one up to OPENAQ_MAX_PAGES = 20. Both fetchOpenAqLocationsPage and fetchOpenAqLatestPage are limited to 20 pages each. While the two calls are parallelized via Promise.all, within each branch pages are sequential.

With AbortSignal.timeout(30_000) per request and 2 retries (1 s delay), a single page can take up to ~63 s in the worst case. With just a couple of slow pages, the total runtime can exceed the 120 s lock lifetime. Once the lock expires, a second Railway cron instance can acquire it and run concurrently — both writing the same payload, but doubling external API traffic and Redis pipeline pressure.

Consider increasing the lock TTL to at least 600_000 (10 minutes) to match a realistic worst-case runtime:

Suggested change
async function fetchPagedResults(fetchPage, label) {
const results = [];
let expectedFound = 0;
for (let page = 1; page <= OPENAQ_MAX_PAGES; page++) {
const payload = await fetchPage(page);
const pageResults = Array.isArray(payload?.results) ? payload.results : [];
results.push(...pageResults);
const found = toFiniteNumber(payload?.meta?.found);
const effectiveLimit = toFiniteNumber(payload?.meta?.limit) ?? OPENAQ_PAGE_LIMIT;
if (found != null && found > 0) expectedFound = found;
if (pageResults.length < effectiveLimit) break;
if (expectedFound > 0 && results.length >= expectedFound) break;
}
if (results.length === 0) {
throw new Error(`${label}: no results returned`);
}
return results;
}
const lockResult = await acquireLockSafely(`${domain}:${resource}`, runId, 600_000, {

Comment on lines +21 to +22
export const OPENAQ_META_KEY = 'seed-meta:health:air-quality';
export const CLIMATE_META_KEY = 'seed-meta:climate:air-quality';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent constant naming: source-based vs destination-based

OPENAQ_META_KEY is named after its data source (OpenAQ), while CLIMATE_META_KEY is named after its destination namespace. Both meta entries originate from OpenAQ, so mixing naming conventions makes the pair harder to reason about. Renaming OPENAQ_META_KEY to HEALTH_META_KEY would make the two constants symmetric and clarify that one covers the health domain and the other the climate domain. Also update the import in tests/air-quality-seed.test.mjs (line 12) and all usages in main.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@FayezBast
Copy link
Copy Markdown
Collaborator Author

did the changes which the bot highlighted

Copy link
Copy Markdown
Owner

@koala73 koala73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — air quality seeder (shared health + climate)

Good work on this one — the mirrored dual-key pattern is the right approach, the proto and generated stubs look correct, and the isMain guard is properly implemented. A few issues before merge.


🔴 P1 — CACHE_TTL = 3600 violates the 3× gold standard

File: scripts/seed-health-air-quality.mjs:19

export const CACHE_TTL = 3600;  // equals the cron interval exactly

The Railway cron runs every 1h = 3600s. Gold standard for all seeders is TTL ≥ 3× interval, so the minimum is 10800s. With TTL matching the cadence exactly, a single missed run (deploy, Railway restart, lock collision) causes the data to expire before the next run executes. The extendExistingTtl call on failure is also using this same value, meaning repeated failures don't buy extra buffer.

Fix: export const CACHE_TTL = 10800; (3h)


🔴 P1 — api/seed-health.js SEED_DOMAINS not updated

File: api/seed-health.js (unchanged in this PR)

The PR updates api/health.js with SEED_META entries for both keys — good. But api/seed-health.js has its own separate SEED_DOMAINS object (used by the /api/seed-health monitoring endpoint) that is not updated. The health dashboard will never report this seeder's freshness status; it will silently stay absent.

Fix: Add to SEED_DOMAINS in api/seed-health.js:

'climate:air-quality':  { key: 'seed-meta:climate:air-quality',  intervalMin: 60 },
'health:air-quality':   { key: 'seed-meta:health:air-quality',   intervalMin: 60 },

(Or a single shared entry if preferred, since they're the same seeder run.)


🟡 P2 — Both keys in FAST_KEYS — payload size risk

File: api/bootstrap.js:48

'socialVelocity', 'climateAirQuality', 'healthAirQuality',

FAST_KEYS are fetched in the critical-path first batch of the bootstrap response. OpenAQ fetches up to 20 pages × 1000 measurements, filtered to the last 2h. This can easily be 5,000–15,000 stations. Putting both keys in FAST_KEYS adds that payload twice (once per domain) to the initial page load.

Consider either: removing them from FAST_KEYS (they load in the second batch), or explicitly bounding the station count in the seeder before writing to Redis (e.g., top 2000 by AQI).


🟡 P2 — Shared seeder writes two separate SEED_META keys

Files: scripts/seed-health-air-quality.mjs, api/health.js

The seeder writes seed-meta:health:air-quality AND seed-meta:climate:air-quality as separate keys, and api/health.js registers both in SEED_META independently. Since it's a single run, both meta keys always have the same timestamp — but the health dashboard will show two separate staleness entries that can only diverge if a Redis write fails mid-run.

This isn't broken, but consider: a single canonical meta key (e.g., seed-meta:air-quality) read by both SEED_META entries would be simpler and eliminate the divergence risk. Not a blocker, but worth a conscious decision.

koala73 added 2 commits April 3, 2026 09:08
…YS, shared meta

- Raise CACHE_TTL from 3600 to 10800 (3× the 1h cron cadence; gold standard)
- Add health:air-quality to api/seed-health.js SEED_DOMAINS so monitoring dashboard tracks freshness
- Remove climateAirQuality and healthAirQuality from FAST_KEYS (large station payloads; load in slow batch)
- Point climateAirQuality SEED_META to same meta key as healthAirQuality (same seeder run, one source of truth)
…until panel exists

- Drop deprecated first URL attempt (parameters=pm25, order_by=lastUpdated, sort=desc);
  use correct v3 params (parameters_id=2, sort_order=desc) directly — eliminates
  guaranteed 4xx retry cycle per page on 20-page crawl
- Remove climateAirQuality and healthAirQuality from BOOTSTRAP_CACHE_KEYS, SLOW_KEYS,
  and BOOTSTRAP_TIERS — no panel consumes these yet; adding thousands of station records
  to every startup bootstrap is pure payload bloat
- Remove normalizeAirQualityPayload helpers from bootstrap.js (no longer called)
- Update service wrappers to fetch via RPC directly; re-add bootstrap hydration
  when a panel actually needs it
…t case

2 OpenAQ calls × 20 pages × (30s timeout × 3 attempts) = 3600s max runtime.
Previous 600s TTL allowed concurrent cron runs on any degraded upstream.
@koala73 koala73 merged commit 9d94ad3 into main Apr 3, 2026
12 checks passed
@koala73 koala73 deleted the feat/add-air-quality-seeder branch April 3, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(climate+health): add air quality seeder (OpenAQ, shared across both variants)

2 participants