refactor(inventory): pure discovery, niente sniff. Profiling tutto al source-check#225
Merged
Conversation
…al source-check Inventory: pura scoperta metadata. Niente sniff, niente encoding handoff. Source-check: unico responsabile di profiling (encoding, delim, colonne, mapping) via toolkit DuckDB. Rimosso _sniff_csv_rows, _is_sniffable, _SNIFFABLE_FORMATS da build_catalog_inventory.py (~90 righe). L'handoff in bulk_source_check.py degrada gracefully: se encoding_suggested non arriva dall'inventory, source-check sniffa autonomamente.
Come per source-check: legge radar_summary.json, filtra fonti con status=RED. Risparmia ~10 min a run saltando fonti che sappiamo essere giu' (openbdap, dati_cultura, mur_ustat...).
866c844 to
64f7633
Compare
Inventory non sniffa piu', risparmia tempo. Source-check e' leggero (~0.07s per item). 1000 item copre il 14% del catalogo a settimana.
64f7633 to
d621976
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cosa cambia
Inventory: solo scoperta metadata
_sniff_csv_rowse_is_sniffabledabuild_catalog_inventory.py(~100 righe)--skip-red-sources per inventory
--workers 8 --skip-red-sourcesSource-check: 500 -> 1000 item
Perché
Inventory e source-check sniffavano lo stesso CSV: uno 10KB (inventory)
e uno 100KB (source-check). L'handoff encoding risparmiava 10KB su 100KB
— complessita per nulla. Ora inventory scopre, source-check valuta.
Ruoli separati, zero overlap.