Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ['distilabel.pipeline'] ❌ Failed to load step 'exam_generation': Step load failed: 'InferenceClient' object has no attribute '_resolve_url' #1117

Open
xenova opened this issue Feb 3, 2025 · 0 comments Β· May be fixed by #1118
Labels
bug Something isn't working

Comments

@xenova
Copy link

xenova commented Feb 3, 2025

Describe the bug

Attempting to follow this tutorial: https://distilabel.argilla.io/dev/sections/pipeline_samples/examples/exam_questions/#build-the-pipeline, I get this error:

[02/03/25 14:19:48] INFO     ['distilabel.pipeline'] πŸ“ Pipeline data will be written to               [base.py](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py):[1015](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py#1015)
                             '/home/codespace/.cache/distilabel/pipelines/ExamGenerator/1f7e4d598a0cea             
                             173a88dd3cab208de83ec78fb0/executions/cdb962598e43a9709e9fb3323a9de0f3e4d             
                             b2363/data/steps_outputs'                                                             
                    INFO     ['distilabel.pipeline'] βŒ› The steps of the pipeline will be loaded in    [base.py](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py):[1046](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py#1046)
                             stages:                                                                               
                              * Legend: 🚰 GeneratorStep 🌐 GlobalStep πŸ”„ Step                                     
                              * Stage 0:                                                                           
                                - 🚰 'load_instructions'                                                           
                                - πŸ”„ 'exam_generation'                                                             
                    INFO     ['distilabel.pipeline'] ⏳ Waiting for all the steps of stage 0 to        [base.py](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py):[1382](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py#1382)
                             load...                                                                               
[02/03/25 14:19:49] ERROR    ['distilabel.pipeline'] ❌ Failed to load step 'exam_generation': Step    [local.py](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/local.py):[316](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/local.py#316)
                             load failed: 'InferenceClient' object has no attribute '_resolve_url'                 
                                                                                                                   
                             For further information visit                                                         
                             'https://distilabel.argilla.io/latest/api/pipeline/step_wrapper'                      
[02/03/25 14:19:51] ERROR    ['distilabel.pipeline'] ❌ Failed to load all the steps of stage 0        [base.py](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py):[1396](file:///home/codespace/.python/current/lib/python3.12/site-packages/distilabel/pipeline/base.py#1396)

NOTE: I also had to replace page with instruction to avoid this error:

ValueError: Step 'exam_generation' requires inputs ['instruction'], but only the inputs=['page'] are available, which means that the inputs=['instruction', 'system_prompt'] are missing or not available when the step gets to be executed in the pipeline. Please make sure previous steps to 'exam_generation' are generating the required inputs.

To reproduce

# Copyright 2023-present, Argilla, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import List

import wikipedia
from pydantic import BaseModel, Field

from distilabel.llms import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromDicts
from distilabel.steps.tasks import TextGeneration

page = wikipedia.page(title="Transfer_learning")


class ExamQuestion(BaseModel):
    question: str = Field(..., description="The question to be answered")
    answer: str = Field(..., description="The correct answer to the question")
    distractors: List[str] = Field(
        ..., description="A list of incorrect but viable answers to the question"
    )


class ExamQuestions(BaseModel):
    exam: List[ExamQuestion]


SYSTEM_PROMPT = """\
You are an exam writer specialized in writing exams for students.
Your goal is to create questions and answers based on the document provided, and a list of distractors, that are incorrect but viable answers to the question.
Your answer must adhere to the following format:
```
[
    {
        "question": "Your question",
        "answer": "The correct answer to the question",
        "distractors": ["wrong answer 1", "wrong answer 2", "wrong answer 3"]
    },
    ... (more questions and answers as required)
]
```
""".strip()


with Pipeline(name="ExamGenerator") as pipeline:
    load_dataset = LoadDataFromDicts(
        name="load_instructions",
        data=[
            {
                "instruction": page.content,
            }
        ],
    )

    text_generation = TextGeneration(
        name="exam_generation",
        system_prompt=SYSTEM_PROMPT,
        template="Generate a list of answers and questions about the document. Document:\n\n{{ instruction }}",
        llm=InferenceEndpointsLLM(
            model_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
            tokenizer_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
            structured_output={
                "schema": ExamQuestions.model_json_schema(),
                "format": "json",
            },
        ),
        input_batch_size=8,
        output_mappings={"model_name": "generation_model"},
    )
    load_dataset >> text_generation


distiset = pipeline.run(
    parameters={
        text_generation.name: {
            "llm": {
                "generation_kwargs": {
                    "max_new_tokens": 2048,
                }
            }
        }
    },
    use_cache=False,
)

Expected behavior

The example should run correctly.

Screenshots

No response

Environment

  • Distilabel Version: 1.5.3
  • Python Version: 3.12.1
  • Hugging Face Hub Version: 0.28.1

Additional context

I'm guessing the _resolve_url private variable was removed in a recent version of huggingface_hub.

@xenova xenova added the bug Something isn't working label Feb 3, 2025
@plaguss plaguss linked a pull request Feb 3, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant