Adding save to HF support for async webcrawler #312
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request addresses #257. It introduces a persistence strategy layer to the project, allowing for more flexible data storage options.
Key changes:
DataPersistenceStrategy
class for defining various persistence strategies.HFDataPersistenceStrategy
, a concrete strategy for pushing data to a Hugging Face dataset.With this update, data extracted via the crawl process can now be seamlessly uploaded to the Hugging Face Hub as datasets, enhancing data sharing and accessibility.
Once this PR is adopted by the community, it will be easy to filter all datasets created using the crawl4ai library by visiting https://huggingface.co/datasets?other=crawl4ai