feat(climate+health):add shared air quality seed and mirrored health#2634
feat(climate+health):add shared air quality seed and mirrored health#2634
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Greptile SummaryThis PR introduces a shared air-quality ingestion pipeline backed by OpenAQ v3 (with optional WAQI supplementation). A single hourly seed script writes the same station payload to both Key findings from the review:
Confidence Score: 3/5Mergeable with caution — the core data flow is correct but the seed script has observability and concurrency gaps that should be addressed before relying on it in production. The handler, proto, generated-stub, and frontend service changes are clean and low risk. The seed script is the main concern: the silent exit(0) on a missing API key will make production misconfiguration invisible to Railway monitoring, the lock TTL is shorter than the realistic worst-case runtime allowing concurrent writes, and the misleading parameter name creates a future maintainability hazard. scripts/seed-health-air-quality.mjs — lock timeout, silent misconfiguration exit, and parameter naming issues. Important Files Changed
Sequence DiagramsequenceDiagram
participant Railway as Railway Cron (hourly)
participant Seed as seed-health-air-quality.mjs
participant Lock as Redis Lock
participant OAQ as OpenAQ v3 API
participant WAQI as WAQI API (optional)
participant Redis as Redis (Upstash)
participant Bootstrap as /api/bootstrap
participant Health as /api/health/v1/list-air-quality-alerts
participant Climate as /api/climate/v1/list-air-quality-data
Railway->>Seed: node seed-health-air-quality.mjs
Seed->>Lock: acquireLock(health:air-quality, 120s)
alt Lock acquired
par Parallel fetches
Seed->>OAQ: GET /v3/locations?parameters=pm25 (paginated)
Seed->>OAQ: GET /v3/parameters/2/latest (paginated)
Seed->>WAQI: GET /map/bounds/ x6 tiles
end
Seed->>Seed: buildOpenAqStations() + buildWaqiStations()
Seed->>Seed: mergeAirQualityStations()
Seed->>Redis: PIPELINE SET health:air-quality:v1 EX 3600
Seed->>Redis: PIPELINE SET climate:air-quality:v1 EX 3600
Seed->>Redis: PIPELINE SET seed-meta:health:air-quality EX 604800
Seed->>Redis: PIPELINE SET seed-meta:climate:air-quality EX 604800
Seed->>Lock: releaseLock()
else Lock skipped
Seed->>Seed: process.exit(0)
end
Bootstrap->>Redis: MGET health:air-quality:v1, climate:air-quality:v1
Bootstrap-->>Bootstrap: hydrate client
Health->>Redis: GET health:air-quality:v1
Health-->>Health: normalizeAirQualityStations() to AirQualityAlert[]
Climate->>Redis: GET climate:air-quality:v1
Climate-->>Climate: normalizeAirQualityStations() to AirQualityStation[]
Reviews (1): Last reviewed commit: "feat(climate+health):add shared air qual..." | Re-trigger Greptile |
scripts/seed-health-air-quality.mjs
Outdated
| export function buildAirQualityPayload({ locations = [], latestMeasurements = [], waqiEntries = [], nowMs = Date.now() }) { | ||
| const openAqStations = buildOpenAqStations(locations, latestMeasurements, nowMs); | ||
| const mergedStations = mergeAirQualityStations(openAqStations, waqiEntries); | ||
| return { | ||
| stations: mergedStations.map(toOutputStation), | ||
| fetchedAt: nowMs, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Misleading
waqiEntries parameter accepts pre-normalized stations, not raw API entries
The parameter is named waqiEntries, implying raw WAQI API payloads (which have fields like lat, lon, aqi as a string, station.name, iaqi.pm25.v). However, the implementation passes the value directly to mergeAirQualityStations, which expects already-normalized station objects with city, countryCode, lng, measuredAt, etc.
The actual caller — fetchAirQualityPayload — correctly passes pre-normalized objects (output of buildWaqiStations), so there is no runtime bug today. However, the exported signature is misleading:
- The tests use
waqiEntries: [], masking the discrepancy. - A future caller passing raw WAQI entries would silently produce wrong results:
stationIdentitywould readstation.city(undefined on raw entries), collapsing all stations to the same key or dropping them entirely.
Rename the parameter to waqiStations to make the contract explicit:
| export function buildAirQualityPayload({ locations = [], latestMeasurements = [], waqiEntries = [], nowMs = Date.now() }) { | |
| const openAqStations = buildOpenAqStations(locations, latestMeasurements, nowMs); | |
| const mergedStations = mergeAirQualityStations(openAqStations, waqiEntries); | |
| return { | |
| stations: mergedStations.map(toOutputStation), | |
| fetchedAt: nowMs, | |
| }; | |
| } | |
| export function buildAirQualityPayload({ locations = [], latestMeasurements = [], waqiStations = [], nowMs = Date.now() }) { | |
| const openAqStations = buildOpenAqStations(locations, latestMeasurements, nowMs); | |
| const mergedStations = mergeAirQualityStations(openAqStations, waqiStations); |
Also update the call site in fetchAirQualityPayload (line 392) to pass waqiStations instead of waqiEntries: waqiStations.
scripts/seed-health-air-quality.mjs
Outdated
| payload = await withRetry(() => fetchAirQualityPayload(), 2, 1_000); | ||
| } catch (error) { | ||
| await releaseLock(`${domain}:${resource}`, runId); | ||
| const durationMs = Date.now() - startMs; | ||
| const cause = error?.cause ? ` (cause: ${error.cause.message || error.cause.code || error.cause})` : ''; | ||
| console.error(` FETCH FAILED: ${error?.message ?? error}${cause}`); | ||
| await extendExistingTtl([ | ||
| HEALTH_AIR_QUALITY_KEY, | ||
| CLIMATE_AIR_QUALITY_KEY, | ||
| OPENAQ_META_KEY, | ||
| CLIMATE_META_KEY, | ||
| ], CACHE_TTL).catch(() => {}); | ||
| console.log(`\n=== Failed gracefully (${Math.round(durationMs)}ms) ===`); | ||
| process.exit(0); |
There was a problem hiding this comment.
Missing API key causes silent
exit(0) — Railway cron reports success on fatal misconfiguration
When OPENAQ_API_KEY is absent, buildOpenAqHeaders() throws immediately. This propagates through fetchPagedResults → fetchAirQualityPayload → the withRetry wrapper (2 retries, both instantly fail) → the catch block below.
The catch block releases the lock, extends existing TTL with stale data, then calls process.exit(0). Railway sees a successful exit code and reports the cron run as successful. A missing API key will therefore pass silently every hour, serve ever-more-stale data, and never surface as an alert.
Consider distinguishing configuration errors from transient network errors and exiting with a non-zero code for the former, so Railway cron failure alerts fire when the environment is misconfigured.
| async function fetchPagedResults(fetchPage, label) { | ||
| const results = []; | ||
| let expectedFound = 0; | ||
|
|
||
| for (let page = 1; page <= OPENAQ_MAX_PAGES; page++) { | ||
| const payload = await fetchPage(page); | ||
| const pageResults = Array.isArray(payload?.results) ? payload.results : []; | ||
| results.push(...pageResults); | ||
|
|
||
| const found = toFiniteNumber(payload?.meta?.found); | ||
| const effectiveLimit = toFiniteNumber(payload?.meta?.limit) ?? OPENAQ_PAGE_LIMIT; | ||
| if (found != null && found > 0) expectedFound = found; | ||
|
|
||
| if (pageResults.length < effectiveLimit) break; | ||
| if (expectedFound > 0 && results.length >= expectedFound) break; | ||
| } | ||
|
|
||
| if (results.length === 0) { | ||
| throw new Error(`${label}: no results returned`); | ||
| } | ||
|
|
||
| return results; | ||
| } |
There was a problem hiding this comment.
Sequential pagination with a 120 s lock could time out under API pressure
fetchPagedResults fetches pages one-by-one up to OPENAQ_MAX_PAGES = 20. Both fetchOpenAqLocationsPage and fetchOpenAqLatestPage are limited to 20 pages each. While the two calls are parallelized via Promise.all, within each branch pages are sequential.
With AbortSignal.timeout(30_000) per request and 2 retries (1 s delay), a single page can take up to ~63 s in the worst case. With just a couple of slow pages, the total runtime can exceed the 120 s lock lifetime. Once the lock expires, a second Railway cron instance can acquire it and run concurrently — both writing the same payload, but doubling external API traffic and Redis pipeline pressure.
Consider increasing the lock TTL to at least 600_000 (10 minutes) to match a realistic worst-case runtime:
| async function fetchPagedResults(fetchPage, label) { | |
| const results = []; | |
| let expectedFound = 0; | |
| for (let page = 1; page <= OPENAQ_MAX_PAGES; page++) { | |
| const payload = await fetchPage(page); | |
| const pageResults = Array.isArray(payload?.results) ? payload.results : []; | |
| results.push(...pageResults); | |
| const found = toFiniteNumber(payload?.meta?.found); | |
| const effectiveLimit = toFiniteNumber(payload?.meta?.limit) ?? OPENAQ_PAGE_LIMIT; | |
| if (found != null && found > 0) expectedFound = found; | |
| if (pageResults.length < effectiveLimit) break; | |
| if (expectedFound > 0 && results.length >= expectedFound) break; | |
| } | |
| if (results.length === 0) { | |
| throw new Error(`${label}: no results returned`); | |
| } | |
| return results; | |
| } | |
| const lockResult = await acquireLockSafely(`${domain}:${resource}`, runId, 600_000, { |
| export const OPENAQ_META_KEY = 'seed-meta:health:air-quality'; | ||
| export const CLIMATE_META_KEY = 'seed-meta:climate:air-quality'; |
There was a problem hiding this comment.
Inconsistent constant naming: source-based vs destination-based
OPENAQ_META_KEY is named after its data source (OpenAQ), while CLIMATE_META_KEY is named after its destination namespace. Both meta entries originate from OpenAQ, so mixing naming conventions makes the pair harder to reason about. Renaming OPENAQ_META_KEY to HEALTH_META_KEY would make the two constants symmetric and clarify that one covers the health domain and the other the climate domain. Also update the import in tests/air-quality-seed.test.mjs (line 12) and all usages in main.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
did the changes which the bot highlighted |
koala73
left a comment
There was a problem hiding this comment.
Code Review — air quality seeder (shared health + climate)
Good work on this one — the mirrored dual-key pattern is the right approach, the proto and generated stubs look correct, and the isMain guard is properly implemented. A few issues before merge.
🔴 P1 — CACHE_TTL = 3600 violates the 3× gold standard
File: scripts/seed-health-air-quality.mjs:19
export const CACHE_TTL = 3600; // equals the cron interval exactlyThe Railway cron runs every 1h = 3600s. Gold standard for all seeders is TTL ≥ 3× interval, so the minimum is 10800s. With TTL matching the cadence exactly, a single missed run (deploy, Railway restart, lock collision) causes the data to expire before the next run executes. The extendExistingTtl call on failure is also using this same value, meaning repeated failures don't buy extra buffer.
Fix: export const CACHE_TTL = 10800; (3h)
🔴 P1 — api/seed-health.js SEED_DOMAINS not updated
File: api/seed-health.js (unchanged in this PR)
The PR updates api/health.js with SEED_META entries for both keys — good. But api/seed-health.js has its own separate SEED_DOMAINS object (used by the /api/seed-health monitoring endpoint) that is not updated. The health dashboard will never report this seeder's freshness status; it will silently stay absent.
Fix: Add to SEED_DOMAINS in api/seed-health.js:
'climate:air-quality': { key: 'seed-meta:climate:air-quality', intervalMin: 60 },
'health:air-quality': { key: 'seed-meta:health:air-quality', intervalMin: 60 },(Or a single shared entry if preferred, since they're the same seeder run.)
🟡 P2 — Both keys in FAST_KEYS — payload size risk
File: api/bootstrap.js:48
'socialVelocity', 'climateAirQuality', 'healthAirQuality',FAST_KEYS are fetched in the critical-path first batch of the bootstrap response. OpenAQ fetches up to 20 pages × 1000 measurements, filtered to the last 2h. This can easily be 5,000–15,000 stations. Putting both keys in FAST_KEYS adds that payload twice (once per domain) to the initial page load.
Consider either: removing them from FAST_KEYS (they load in the second batch), or explicitly bounding the station count in the seeder before writing to Redis (e.g., top 2000 by AQI).
🟡 P2 — Shared seeder writes two separate SEED_META keys
Files: scripts/seed-health-air-quality.mjs, api/health.js
The seeder writes seed-meta:health:air-quality AND seed-meta:climate:air-quality as separate keys, and api/health.js registers both in SEED_META independently. Since it's a single run, both meta keys always have the same timestamp — but the health dashboard will show two separate staleness entries that can only diverge if a Redis write fails mid-run.
This isn't broken, but consider: a single canonical meta key (e.g., seed-meta:air-quality) read by both SEED_META entries would be simpler and eliminate the divergence risk. Not a blocker, but worth a conscious decision.
…YS, shared meta - Raise CACHE_TTL from 3600 to 10800 (3× the 1h cron cadence; gold standard) - Add health:air-quality to api/seed-health.js SEED_DOMAINS so monitoring dashboard tracks freshness - Remove climateAirQuality and healthAirQuality from FAST_KEYS (large station payloads; load in slow batch) - Point climateAirQuality SEED_META to same meta key as healthAirQuality (same seeder run, one source of truth)
…yloads avoid critical-path batch
…until panel exists - Drop deprecated first URL attempt (parameters=pm25, order_by=lastUpdated, sort=desc); use correct v3 params (parameters_id=2, sort_order=desc) directly — eliminates guaranteed 4xx retry cycle per page on 20-page crawl - Remove climateAirQuality and healthAirQuality from BOOTSTRAP_CACHE_KEYS, SLOW_KEYS, and BOOTSTRAP_TIERS — no panel consumes these yet; adding thousands of station records to every startup bootstrap is pure payload bloat - Remove normalizeAirQualityPayload helpers from bootstrap.js (no longer called) - Update service wrappers to fetch via RPC directly; re-add bootstrap hydration when a panel actually needs it
…t case 2 OpenAQ calls × 20 pages × (30s timeout × 3 attempts) = 3600s max runtime. Previous 600s TTL allowed concurrent cron runs on any degraded upstream.
Summary
Adds a shared air-quality ingestion path backed by OpenAQ v3 with optional WAQI supplementation. The new seed writes the same station payload to both
health:air-quality:v1andclimate:air-quality:v1, adds mirrored health/climate RPCs for that data, updates generated API clients/server stubs and OpenAPI docs, and registers the new cache/bootstrap keys plus lightweight frontend hydration consumers.NOTE :
u mentioned that no api key req for OpenAQ v3 but it req one,
Hourly execution is an external infra dependency, not repo-managed. This PR assumes an hourly cron for node scripts/seed-health-air-quality.mjs; repo config does not enforce it (see vercel.json (line 3), crons: []). Ensure Railway cron is set to run every 1 hour.
Fixes #2471
Type of change
Affected areas
/api/*)Checklist
npm run typecheck)