Skip to content

source-check: circuit breaker host-level e inferenza preview via HEAD#230

Merged
Gabrymi93 merged 4 commits into
mainfrom
feat/source-check-circuit-head-preview
May 15, 2026
Merged

source-check: circuit breaker host-level e inferenza preview via HEAD#230
Gabrymi93 merged 4 commits into
mainfrom
feat/source-check-circuit-head-preview

Conversation

@Gabrymi93
Copy link
Copy Markdown
Member

@Gabrymi93 Gabrymi93 commented May 14, 2026

Contesto

Questo aggiornamento migliora la robustezza di bulk_source_check su host instabili e amplia la copertura preview per URL senza estensione esplicita.

Cosa cambia

  • aggiunto circuito host-level configurabile via --circuit-fail-threshold
  • centralizzata la configurazione HTTP del run (timeout/retry) in configure_source_check_http
  • estesa inferenza formato preview via HEAD (Content-Type + Content-Disposition)
  • supporto preview per tsv, geojson, jsonl/ndjson
  • mantenuta integrazione in bulk_source_check per csv_preview e csv_preview_circuit
  • rimossa la gestione ZIP in preview per mantenere complessita sotto controllo

Verifica locale

  • pytest -q tests/test_bulk_source_check.py -> ok
  • run ridotto con source-id openbdap e circuit-fail-threshold 2 -> ok

Note review

  • il circuito logga l apertura solo al raggiungimento soglia
  • nessuna modifica a flussi fuori source-check

Gabrymi93 added 4 commits May 14, 2026 10:41
Tagli basati su verifica:
- TSV: toolkit supporta (stessa pipeline CSV con delim='\t') → tenuto
- GeoJSON: toolkit non processa → rimosso
- JSONL/NDJSON: toolkit non processa → rimosso
- csv_preview_circuit: workaround nel flow, torna _EMPTY_ENRICH

Risparmio ~110 righe nette.
@Gabrymi93
Copy link
Copy Markdown
Member Author

Review applicata

Tagliati formati preview che il toolkit non processa end-to-end (raw→clean→mart):

Tolto Perché
GeoJSON preview Toolkit is_supported_input_file rifiuta .geojson, nessun reader
JSONL / NDJSON preview Zero presenza nel toolkit, .json è esplicitamente escluso
csv_preview_circuit enrich_method Sostituito da _EMPTY_ENRICH — evitato workaround nel flow _check_row

Tenuto: TSV — toolkit lo supporta (stessa pipeline CSV con delim: \t).

Risparmio: -74 righe. Stessa robustezza (circuit breaker + HEAD inference), nessun codice dormiente.

Commit: 7fa16b2

@Gabrymi93 Gabrymi93 marked this pull request as ready for review May 15, 2026 13:17
@Gabrymi93 Gabrymi93 merged commit dc233dc into main May 15, 2026
1 check passed
@Gabrymi93 Gabrymi93 deleted the feat/source-check-circuit-head-preview branch May 15, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant