Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix validation error when only Zarr assets are uploaded #2062

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aaronkanzer
Copy link
Member

@aaronkanzer aaronkanzer commented Oct 30, 2024

Description of error

As documented in #1814, we receive the following validation error on the web application when uploading only Zarr asset(s) to a Dandiset (and not blobs):

assetsSummary: A Dandiset containing no files or zero bytes is not publishable

The error is raised from dandi-schema when a validation check occurs on the assetSummary field in the dandiset.yaml. See the models module:

@validator("assetsSummary")
def check_filesbytes(cls, values: AssetsSummary) -> AssetsSummary:
    if values.numberOfBytes == 0 or values.numberOfFiles == 0:
        raise ValueError(
            "A Dandiset containing no files or zero bytes is not publishable"
        )
    return values

And it arises because in the query below only the blob asset is evaluated:

'numberOfBytes': 1 if version.assets.filter(blob__size__gt=0).exists() else 0,

Description of proposed fix

The proposed changes additionally evaluate the size of Zarr assets that are in the COMPLETE state to generate the assetSummary:numberOfBytes metadata field.

cc @kabilar

Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a bit more explanation of the actual bug, and why this fixes it. We need to be careful with Zarr statuses because those have a complex relationship with Assets that results in (perhaps overly) complex semantics.

@@ -90,7 +92,9 @@ def version_aggregate_assets_summary(version: Version) -> None:

assets_summary = aggregate_assets_summary(
asset.full_metadata
for asset in version.assets.filter(status=Asset.Status.VALID)
for asset in version.assets.filter(
Q(status=Asset.Status.VALID) | Q(zarr__status=ZarrArchiveStatus.UPLOADED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the problem that led to this particular change? We need @jjnesbitt to confirm, but I don't think we should be counting Zarr archives in the UPLOADED state (those have not yet been "finalized" in the terminology of the Zarr API; once they are, I believe they move into the COMPLETE state).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- updated to COMPLETE here and in the unit test.

The expansion of the filter query here is to match similar logic in the validate_pending_asset_metadata Celery task -- specific code reference here

@aaronkanzer
Copy link
Member Author

aaronkanzer commented Oct 31, 2024

I think we need a bit more explanation of the actual bug, and why this fixes it. We need to be careful with Zarr statuses because those have a complex relationship with Assets that results in (perhaps overly) complex semantics.

In the case where there is no blob but just zarr FKs associated with the Asset, validation fails since the query of:

'numberOfBytes': 1 if version.assets.filter(blob__size__gt=0).exists() else 0,

only evaluates the blob Foreign Key. I stumbled upon this bug when uploading a dandiset of pure zarr if you'd like to replicate in DANDI Archive + dandischema's current state

@kabilar kabilar changed the title Resolve all asset types being evaluated during upload validation Fix validation error when only Zarr asset(s) is uploaded Nov 25, 2024
@kabilar kabilar changed the title Fix validation error when only Zarr asset(s) is uploaded Fix validation error when only Zarr assets are uploaded Nov 25, 2024
@kabilar
Copy link
Member

kabilar commented Nov 25, 2024

Hi @aaronkanzer, thank you for the original fix and pushing these changes upstream to DANDI.

Hi @waxlamp @jjnesbitt, I made some minor changes to the original comment above to further describe the issue and fix. Please take a look at it when you have a chance. Thank you.

Comment on lines +388 to +392
if asset_type == 'blob':
assert version.metadata['assetsSummary']['numberOfBytes'] == asset.blob.size

elif asset_type == 'zarr':
assert version.metadata['assetsSummary']['numberOfBytes'] == asset.size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe asset.size can be used for both here, since it abstracts over blob vs zarr.

Suggested change
if asset_type == 'blob':
assert version.metadata['assetsSummary']['numberOfBytes'] == asset.blob.size
elif asset_type == 'zarr':
assert version.metadata['assetsSummary']['numberOfBytes'] == asset.size
assert version.metadata['assetsSummary']['numberOfBytes'] == asset.size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants