Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for batch processing #286

Open
thobson88 opened this issue Jan 10, 2025 · 2 comments
Open

Add API for batch processing #286

thobson88 opened this issue Jan 10, 2025 · 2 comments
Assignees

Comments

@thobson88
Copy link
Collaborator

thobson88 commented Jan 10, 2025

PR open into 276-refactor.

Some code already exists for processing newspapers at scale, on the H-Top repo under :

/H-Top/generate_toponym_dataset/

In particular, the script apply_to_news_modular.py appears to be the template for the example given in the docs on running T-Res at scale.

The task here is to build something like this into the API.

Must be able to process a csv file of input data (one row per article text) as in the apply_to_news_modular.py script

If necessary, should also support a zip file of input data (one txt file per article text, structured in the usual way e.g. 0003548/1904/0616/0003548_19040616_art0053.txt). This may or may not be required, depending on whether the complete open newspapers collections (HMD & LwM) are already available in csv format (above).

@thobson88
Copy link
Collaborator Author

In progress on branch 286-batch-processing.

@thobson88
Copy link
Collaborator Author

Support for a zip file of input data appears not to be needed as CSV files are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant