Skip to content

Eliminating Manual Class Registration in Unitxt with Import Paths #1575

@elronbandel

Description

@elronbandel

Problem Statement

In Unitxt, every artifact in the catalog includes a __type__ field in its JSON representation. This field stores the class that was used to instantiate the artifact, which is necessary for loading it back into a Python instance.

Currently, Unitxt relies on a class registry that maps a prettified class name to its actual class. The __type__ field stores the prettified name, and when an artifact is loaded, this name is used to look up the original class in the registry.

However, this approach introduces several challenges:

  1. Manual Class Registration – Any class that might appear in the catalog must be registered in advance.
  2. Import Dependencies – Users must explicitly import all custom classes used in the catalog within any code accessing it. This can be difficult to debug and communicate to users.
  3. Ongoing Maintenance – Users frequently encounter this issue and must manually maintain the solution.

Proposed Solution

Instead of storing a prettified name, we propose changing the __type__ field to store:

  • A full import path (e.g., "unitxt.loaders.LoadHF") for globally available classes.
  • A relative import path (e.g., ".MyOperator") based on a registered folder.

By default, the current working directory will be automatically registered, making the system more intuitive for small projects running locally.

Benefits of the Proposed Change

  1. No More Manual Class Registration – Libraries using Unitxt will no longer need to register their classes manually.
  2. Improved Usability for Small Projects – Projects operating within a single working directory will work seamlessly using relative imports.
  3. Support for Larger Projects – Projects without a formal package structure can register their main directories and use relative imports.

This change will make Unitxt more user-friendly, reduce setup complexity, and improve error handling.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions