Skip to content

refactor(inventory): pure discovery, niente sniff. Profiling tutto al source-check#225

Merged
Gabrymi93 merged 4 commits into
mainfrom
feat/pure-discovery-inventory
May 12, 2026
Merged

refactor(inventory): pure discovery, niente sniff. Profiling tutto al source-check#225
Gabrymi93 merged 4 commits into
mainfrom
feat/pure-discovery-inventory

Conversation

@Gabrymi93
Copy link
Copy Markdown
Member

Cosa cambia

Inventory: solo scoperta metadata

  • Rimosso _sniff_csv_rows e _is_sniffable da build_catalog_inventory.py (~100 righe)
  • Inventory non scarica piu' 10KB di sample per sniffare encoding
  • Non produce piu' encoding_suggested, delim_suggested nei parquet
  • Source-check fa tutto il profiling autonomamente (DuckDB toolkit, piu' affidabile)

--skip-red-sources per inventory

  • Come source-check: salta fonti RED da radar_summary.json
  • Risparmia ~10 min a run su fonti down (openbdap, dati_cultura, mur_ustat)
  • Workflow aggiornato: --workers 8 --skip-red-sources

Source-check: 500 -> 1000 item

  • Raddoppia copertura settimanale (14% del catalogo)
  • Costo irrisorio (~3 min extra su run totale)

Perché

Inventory e source-check sniffavano lo stesso CSV: uno 10KB (inventory)
e uno 100KB (source-check). L'handoff encoding risparmiava 10KB su 100KB
— complessita per nulla. Ora inventory scopre, source-check valuta.
Ruoli separati, zero overlap.

Gabrymi93 added 3 commits May 11, 2026 23:23
…al source-check

Inventory: pura scoperta metadata. Niente sniff, niente encoding handoff.
Source-check: unico responsabile di profiling (encoding, delim, colonne,
mapping) via toolkit DuckDB.

Rimosso _sniff_csv_rows, _is_sniffable, _SNIFFABLE_FORMATS da
build_catalog_inventory.py (~90 righe). L'handoff in bulk_source_check.py
degrada gracefully: se encoding_suggested non arriva dall'inventory,
source-check sniffa autonomamente.
Come per source-check: legge radar_summary.json, filtra fonti
con status=RED. Risparmia ~10 min a run saltando fonti che
sappiamo essere giu' (openbdap, dati_cultura, mur_ustat...).
@Gabrymi93 Gabrymi93 force-pushed the feat/pure-discovery-inventory branch from 866c844 to 64f7633 Compare May 11, 2026 22:32
Inventory non sniffa piu', risparmia tempo. Source-check e' leggero
(~0.07s per item). 1000 item copre il 14% del catalogo a settimana.
@Gabrymi93 Gabrymi93 force-pushed the feat/pure-discovery-inventory branch from 64f7633 to d621976 Compare May 11, 2026 22:34
@Gabrymi93 Gabrymi93 merged commit d88fb51 into main May 12, 2026
2 checks passed
@Gabrymi93 Gabrymi93 deleted the feat/pure-discovery-inventory branch May 12, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant