Skip to content

Conversation

@dreadatour
Copy link
Contributor

@dreadatour dreadatour commented Nov 25, 2025

Default dataset status is CREATED while default dataset version status is PENDING. I am not sure what this PENDING status even means — no usage for this status was found in the codebase.

Setting default dataset version status to CREATED for the consistency.

Also renamed create_new_dataset_version to create_dataset_version, since "create new" looks a bit odd.

See also related Studio PR.

@dreadatour dreadatour self-assigned this Nov 25, 2025
Copilot finished reviewing on behalf of dreadatour November 25, 2025 02:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes the default status for dataset versions to be CREATED (matching the dataset default status) instead of PENDING, which was unused in the codebase. Additionally, it renames the method create_new_dataset_version to create_dataset_version for better naming consistency.

  • Changed default dataset version status from PENDING to CREATED for consistency with dataset status
  • Renamed create_new_dataset_version method to create_dataset_version

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/datachain/catalog/catalog.py Updated method name and changed default status parameter from DatasetStatus.PENDING to DatasetStatus.CREATED
tests/func/test_datasets.py Updated test assertions to expect DatasetStatus.CREATED instead of PENDING, and updated method call to use new name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@shcheklein
Copy link
Contributor

I think in case of Datasets CREATED sounds like COMPLETED - thus we had pending (?) ... not sure we need to change that, but not strong opinion either

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@dreadatour
Copy link
Contributor Author

dreadatour commented Nov 25, 2025

I think in case of Datasets CREATED sounds like COMPLETED - thus we had pending (?) ... not sure we need to change that, but not strong opinion either

For datasets we also have COMPLETE status and we do have a lot of checks for this in the code. By default dataset is creating with CREATED status and after all it became COMPLETE.

My guess is PENDING was used for the storage indexing, I can go deeper in git history to find where it was used.

For now DatasetStatus.PENDING is not used anywhere, — only as a default status during dataset version creation, and it is updating to COMPLETE in the end.

I am going to go deeper with these statuses, like for example, why do we need finished_at based on statuses for dataset versions (here)? But a bit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants