Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally include the pipeline script in the hub when pushing your distiset #762

Merged
merged 14 commits into from
Jul 4, 2024

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Jun 28, 2024

Description

This PR add the option of pushing the script of the pipeline being run to the hugging face hub (by default it will be set to False, to avoid potential errors):

with Pipeline() as pipe:
    ...
distiset = pipeline.run(use_cache=False)
distiset.push_to_hub("plaguss/pipe_nothing_test", include_script=True)

This simplifies sharing the code that created the pipeline, as well as custom steps.

Example script.

If the script was uploaded to the hub, an entry will be written in the README.md of the repo to show it:

image

The cli has also been updated to allow running remote (or local) scripts, as we do with pipelines defined in their pipeline.yaml config file:

  • Pointing to the same file we uploaded:
distilabel pipeline run --trust-code "https://huggingface.co/datasets/plaguss/pipe_nothing_test/raw/main/pipe_nothing.py"
  • Pointing to the local file:
distilabel pipeline run --trust-code "path/to/pipe_nothing.py"

@plaguss plaguss added this to the 1.3.0 milestone Jun 28, 2024
@plaguss plaguss requested a review from gabrielmbmb June 28, 2024 12:23
@plaguss plaguss self-assigned this Jun 28, 2024
Copy link

codspeed-hq bot commented Jun 28, 2024

CodSpeed Performance Report

Merging #762 will not alter performance

Comparing push-pipeline-py (cbcf0bf) with develop (647d040)

Summary

✅ 1 untouched benchmarks

@plaguss plaguss linked an issue Jun 28, 2024 that may be closed by this pull request
@plaguss plaguss marked this pull request as ready for review July 1, 2024 07:58
Copy link
Member

@gabrielmbmb gabrielmbmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/distilabel/cli/pipeline/app.py Outdated Show resolved Hide resolved
src/distilabel/cli/pipeline/utils.py Outdated Show resolved Hide resolved
src/distilabel/cli/pipeline/utils.py Outdated Show resolved Hide resolved
docs/sections/how_to_guides/advanced/cli/index.md Outdated Show resolved Hide resolved
@plaguss plaguss merged commit cc36fa5 into develop Jul 4, 2024
6 checks passed
@plaguss plaguss deleted the push-pipeline-py branch July 4, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Upload pipeline script when pushing distiset to hub
2 participants