Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: return type of get_default_model_and_revision might be incorrectly documented? #35981

Open
MarcoGorelli opened this issue Jan 31, 2025 · 1 comment · May be fixed by #35982
Open

Docs: return type of get_default_model_and_revision might be incorrectly documented? #35981

MarcoGorelli opened this issue Jan 31, 2025 · 1 comment · May be fixed by #35982

Comments

@MarcoGorelli
Copy link

MarcoGorelli commented Jan 31, 2025

The return type here is documented as Union[str, Tuple[str, str]]

def get_default_model_and_revision(
targeted_task: Dict, framework: Optional[str], task_options: Optional[Any]
) -> Union[str, Tuple[str, str]]:

The docstring just says str

`str` The model string representing the default model for this pipeline

But I think that only Tuple[str, str] might be correct?

For example, if I run

from transformers import Pipeline
# from pair_classification import PairClassificationPipeline
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import pipeline
from transformers.utils import direct_transformers_import, is_tf_available, is_torch_available
import numpy as np


def softmax(outputs):
    maxes = np.max(outputs, axis=-1, keepdims=True)
    shifted_exp = np.exp(outputs - maxes)
    return shifted_exp / shifted_exp.sum(axis=-1, keepdims=True)


class PairClassificationPipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        if "second_text" in kwargs:
            preprocess_kwargs["second_text"] = kwargs["second_text"]
        return preprocess_kwargs, {}, {}

    def preprocess(self, text, second_text=None):
        return self.tokenizer(text, text_pair=second_text, return_tensors=self.framework)

    def _forward(self, model_inputs):
        return self.model(**model_inputs)

    def postprocess(self, model_outputs):
        logits = model_outputs.logits[0].numpy()
        probabilities = softmax(logits)

        best_class = np.argmax(probabilities)
        label = self.model.config.id2label[best_class]
        score = probabilities[best_class].item()
        logits = logits.tolist()
        return {"label": label, "score": score, "logits": logits}


PIPELINE_REGISTRY.register_pipeline(
    "custom-text-classification",
    pipeline_class=PairClassificationPipeline,
    pt_model=AutoModelForSequenceClassification if is_torch_available() else None,
    tf_model=TFAutoModelForSequenceClassification if is_tf_available() else None,
    default={"pt": ("hf-internal-testing/tiny-random-distilbert", "2ef615d")},
    type="text",
)
assert "custom-text-classification" in PIPELINE_REGISTRY.get_supported_tasks()

_, task_def, _ = PIPELINE_REGISTRY.check_task("custom-text-classification")

classifier = pipeline('custom-text-classification')

then I get

ValueError                                Traceback (most recent call last)
<ipython-input-6-0cc5199a8521> in <cell line: 53>()
     51 _, task_def, _ = PIPELINE_REGISTRY.check_task("custom-text-classification")
     52 
---> 53 classifier = pipeline('custom-text-classification')

/usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
    898     if model is None:
    899         # At that point framework might still be undetermined
--> 900         model, default_revision = get_default_model_and_revision(targeted_task, framework, task_options)
    901         revision = revision if revision is not None else default_revision
    902         logger.warning(

ValueError: too many values to unpack (expected 2)

It looks like pipeline expects a tuple, not a string


Looks like this may have just been forgotten during #17667?

@MarcoGorelli
Copy link
Author

I can submit a PR if there's interest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant