Skip to content

Conversation

@yannforget
Copy link
Member

@yannforget yannforget commented Oct 15, 2025

The PR adjusts the typing of PipelineWithTask and Pipeline.task() to improve developer experience when writing pipelines in an IDE. The goal is to improve IDE type hints and autocomplete when using tasks, and allow users to see parameter signatures and docstrings when hovering over functions decorated as tasks.

As of today, when hovering over a task in a pipeline DAG, we are losing the function docstring and signature. Which means developers have to scroll down to the task definition to verify the parameters that are needed and their types.

For example, using the simple_io example pipeline:

Screenshot from 2025-10-15 18-42-10

Changes

  • Use ParamSpec and TypeVar to make PipelineWithTask generic and preserve types. I think there is a simpler way to do this in Python 3.12 but we still want to support 3.11
  • Use functools.wraps to preserve function metadata (e.g. __doc__, __annotations__ etc)
  • Changed return type of Pipeline.task() to Callable[P, R] to claim returning the original function return type instead of a Task object
Screenshot from 2025-10-15 18-40-38

Also, the IDE will now detect type errors when providing wrong task parameters. For example, if the task expects a GeoDataFrame but you are passing a DataFrame.

The trade-off is that if a user tries to manipulate the object returned by a Task in the main pipeline DAG (e.g. df.head()), they might be confused. The IDE is not going to detect the error because we type it as DataFrame while it's a Task at runtime.

@yannforget yannforget requested review from bramj and nazarfil October 15, 2025 17:00
@yannforget yannforget requested a review from yolanfery October 16, 2025 10:20
Copy link
Contributor

@yolanfery yolanfery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing an improvement here.

I tried testing in PyCharm but I don't see much a difference before and after, not sure if it is my local setup or PyCharm specific. Sorry then I can't go deeper in testing this, right now.

In your current file diff, I think PipelineWithTask is not used anymore so the class can be removed, right ?

I explored a bit with AI and I feel like it would make sense to wrap the result to compensate for the tradeoff you mentioned. Maybe with something like Task[R]

To have something like :

  1. Make Task generic over return type
  class Task(typing.Generic[R]):
      def __init__(self, function: typing.Callable[..., R]):
          self.result: R | None = None
          ...
  1. Add helpful getattr error message
  def __getattr__(self, name: str):
      """Catch attempts to access result attributes in the DAG."""
      raise TypeError(
          f"Cannot access '.{name}' on Task object in pipeline DAG. "
          f"Task results are only available inside other task functions..."
      )
  1. Update decorator to return Task[R] and a Callable[P, Task[R]]
  def task(self, function: typing.Callable[P, R]) -> typing.Callable[P, Task[R]]:
      @wraps(function)
      def wrapper(*task_args: P.args, **task_kwargs: P.kwargs) -> Task[R]:
          task = Task(function)(*task_args, **task_kwargs)
          self.tasks.append(task)
          return task
      return wrapper

It brings transparency and (runtime but) explicit error. Does all this make sense ?

@yannforget
Copy link
Member Author

yannforget commented Oct 20, 2025

Yes, great idea! I'll try to test it on Pycharm. I only tested it on VSCode using the Pylance LSP for now. Do you know which LSP is used by Pycharm? It's a custom one right?

@yolanfery
Copy link
Contributor

yolanfery commented Oct 20, 2025

Yes, great idea! I'll try to test it on Pycharm. I only tested it on VSCode using the Pylance LSP for now. Do you know which LSP is used by Pycharm? It's a custom one right?

It’s actually a native, custom, and probably proprietary implementation, though PyCharm can be configured to use external LSPs if needed

@yannforget
Copy link
Member Author

yannforget commented Oct 29, 2025

I'm leaving the PR on hold for now because I'm not sure it's going to improve DX in the end...
This could be a problem when writing tests for pipelines, where we need the tasks objects to be Task, because we need to manually run them.
Example here: https://github.com/BLSQ/openhexa-templates-ds/blob/a9317f435c37fa9292b6ee7c6a8facf23ba4bc0e/era5_sync/tests/test_pipeline.py#L51

Might be ok to just write task.run() # type: ignore to satisfy the type checker though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants