Add API for batch processing #286

thobson88 · 2025-01-10T10:00:48Z

PR open into 276-refactor.

Some code already exists for processing newspapers at scale, on the H-Top repo under :

/H-Top/generate_toponym_dataset/

In particular, the script apply_to_news_modular.py appears to be the template for the example given in the docs on running T-Res at scale.

The task here is to build something like this into the API.

Must be able to process a csv file of input data (one row per article text) as in the apply_to_news_modular.py script

If necessary, should also support a zip file of input data (one txt file per article text, structured in the usual way e.g. 0003548/1904/0616/0003548_19040616_art0053.txt). This may or may not be required, depending on whether the complete open newspapers collections (HMD & LwM) are already available in csv format (above).

The text was updated successfully, but these errors were encountered:

thobson88 · 2025-01-16T16:27:50Z

In progress on branch 286-batch-processing.

thobson88 · 2025-01-27T14:59:53Z

Support for a zip file of input data appears not to be needed as CSV files are available.

thobson88 self-assigned this Jan 10, 2025

thobson88 mentioned this issue Jan 10, 2025

Migrate docs site to material for mkdocs #284

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API for batch processing #286

Add API for batch processing #286

thobson88 commented Jan 10, 2025 •

edited

Loading

thobson88 commented Jan 16, 2025

thobson88 commented Jan 27, 2025

Add API for batch processing #286

Add API for batch processing #286

Comments

thobson88 commented Jan 10, 2025 • edited Loading

thobson88 commented Jan 16, 2025

thobson88 commented Jan 27, 2025

thobson88 commented Jan 10, 2025 •

edited

Loading