Add CLI commands for browsing and searching OpenML datasets #1508
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add three new CLI subcommands under 'openml datasets':
Features:
Addresses ESoC 2025 goal of improving user experience of the dataset catalogue.
Related to issue #1503 Add CLI Commands for browsing and searching OpenML datasets
Metadata
openml datasets list,openml datasets info, andopenml datasets search"Details
What does this PR implement/fix?
This PR adds three new CLI subcommands under
openml datasetsto improve the user experience of the dataset catalogue:openml datasets list- List datasets with optional filtering (tag, status, data_name, number_instances, number_features, number_classes, pagination, output format)openml datasets info <dataset_id>- Display detailed information about a specific dataset including qualities, features, and metadataopenml datasets search <query>- Search datasets by name with case-insensitive matchingWhy is this change necessary? What is the problem it solves?
Currently, users must write Python code to browse or search OpenML datasets, even for simple tasks like listing available datasets or finding a specific dataset. This creates a barrier to entry and makes the dataset catalogue less accessible. Adding CLI commands allows users to interact with the dataset catalogue directly from the command line without writing code.
This directly addresses the ESoC 2025 goal of "Improving user experience of the dataset catalogue in AIoD and OpenML".
How can I reproduce the issue this PR is solving and its solution?
Before (requires Python code):
After (CLI commands):
Implementation Details:
openml/cli.py:datasets_list(),datasets_info(),datasets_search()_format_output()for consistent output formatting (table/JSON)tests/test_openml/test_cli.py(11 test cases)openml.datasets.list_datasets()andopenml.datasets.get_dataset()functions - no changes to core APIconfigurecommand)