-
Notifications
You must be signed in to change notification settings - Fork 39
HF dataset integration #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Docs preview URL |
Coverage Report
Files without new missing coverage
281 files skipped due to complete coverage. Coverage success: total of 98.03% is above 98.02% 🎉 |
3e3511b to
1b4f867
Compare
…d simplify item yielding in _iter_from_stream function
|



Description
Introduces :
from_huggingface_hub(): a data connector to stream/load HuggingFace datasets into edsnlp Streams (supports split/config auto-detection, shuffle, looping, and converter kwargs).to_huggingface_hub(): export Stream or Docs to a datasets.IterableDataset, with optional materialize-and-push (push_to_hub=True).hf_ner: converter to read/write token-level NER HF datasets.hf_text: converter to read/write text-only HF datasets.N.B. sorry for the messy last PR, I closed it and opened a cleaner one
Checklist