Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asset "layout" metadata field, validate asset paths #157

Open
yarikoptic opened this issue Feb 7, 2023 · 8 comments
Open

Add asset "layout" metadata field, validate asset paths #157

yarikoptic opened this issue Feb 7, 2023 · 8 comments

Comments

@yarikoptic
Copy link
Member

in a follow up to dandi/dandi-cli#1205 (comment) of @satra

(@yarikoptic - this was done outside of dandi schema and also related to web-based validation, rather than local validation).

yes, that was done in dandi/dandi-cli#1173 but it does not mean that it should stay here, at least as the "implementation logic".
In the past we removed dandi-cli from dependencies of dandi-archive IIRC to break the "dependency loop" since dandi-cli depends on API implementation of the archive. Hence we keep code worth reusing in both -cli and -archive elsewhere (dandi-schema, zarr_checksum) etc.

So may be we should move the validate_organized_path into dandischema? But again -- then for an asset it would matter on either it is a BIDS asset or some other (DANDI layout) asset.

looking at a sample asset from 000026 which is BIDS -- we do not provide such metadata
❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/assets/3a6cdce9-ab49-41dd-967f-29349d0cc341/' -H 'accept: application/json' | jq .
{
  "id": "dandiasset:3a6cdce9-ab49-41dd-967f-29349d0cc341",
  "path": "sub-EXC022/ses-MRI/anat/sub-EXC022_ses-MRI-echo-2_flip-2_VFA.json",
  "access": [
    {
      "status": "dandi:OpenAccess",
      "schemaKey": "AccessRequirements"
    }
  ],
  "digest": {
    "dandi:sha2-256": "f48296e2812476e813c8d11e59ab1947ddfc6a8330c32ace04f040cd5d7d213e",
    "dandi:dandi-etag": "a7aaf0935f14bdb4d686486f4512a299-1"
  },
  "@context": "https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json",
  "schemaKey": "Asset",
  "contentUrl": [
    "https://api.dandiarchive.org/api/assets/3a6cdce9-ab49-41dd-967f-29349d0cc341/download/",
    "https://dandiarchive.s3.amazonaws.com/blobs/5d5/2c6/5d52c625-2efd-4ff2-a768-bd4f15e7da46"
  ],
  "identifier": "3a6cdce9-ab49-41dd-967f-29349d0cc341",
  "repository": "https://dandiarchive.org/",
  "contentSize": 247,
  "dateModified": "2021-10-05T14:50:57.025495-04:00",
  "schemaVersion": "0.6.0",
  "encodingFormat": "application/json",
  "wasGeneratedBy": [
    {
      "id": "urn:uuid:7eec0bb1-668c-48da-a3be-a90df5485d8c",
      "name": "Metadata generation",
      "schemaKey": "Activity",
      "description": "Metadata generated by DANDI cli",
      "wasAssociatedWith": [
        {
          "url": "https://github.com/dandi/dandi-cli",
          "name": "DANDI Command Line Interface",
          "version": "0.27.3",
          "schemaKey": "Software",
          "identifier": "RRID:SCR_019009"
        }
      ]
    }
  ],
  "blobDateModified": "2021-10-05T12:55:21.243891-04:00"
}

so to validate the path we should first start annotating assets with "layout" (optional, default to DANDI, could be BIDS). Then validate either using our DANDI layout regex, or BIDS (using bidsschematools like done in dandi-cli) purely on the path (ref also #74 ).

@satra
Copy link
Member

satra commented Feb 11, 2023

is there something that prevents detecting layout based on either local/remote dandiset.yaml + dataset_description.json ? if dataset_description exists, it's bids, if not it's dandi. since all assets are associated with a dandiset.

@yarikoptic
Copy link
Member Author

Iirc we don't have association from asset to dandiset

@satra
Copy link
Member

satra commented Feb 13, 2023

@yarikoptic - we do (see s3 bucket) and can expose it via api. at least at this point, assets still have a one to one correspondence with dandiset, but one to many with dandiset/version. so if assets are listed per dandiset, they have explicit association.

@yarikoptic
Copy link
Member Author

@yarikoptic - we do (see s3 bucket) and can expose it via api. at least at this point, assets still have a one to one correspondence with dandiset, but one to many with dandiset/version. so if assets are listed per dandiset, they have explicit association.

I am asking since all such validations should be done within dandischema, not through some ad-hoc logic/API within dandi-cli if we would like dandi-archive to use them, hence this issue here in dandischema. So the question is really "where at the level of dandischema we do have such association?"

@satra
Copy link
Member

satra commented Feb 13, 2023

i think the difference i am making is that in dandischema such a function could take type of layout as input, but that doesn't need to be in the asset model.

@yarikoptic
Copy link
Member Author

Per brief monday discussion -- we might want/need to make validation of assets interface to be exposed to the notion of dandiset across the entire surface of the archive interfaces/implementation. But I would like to remind that dandi-schema is ATM the core logic for validation, so it might need to start here somehow and decide on how to expose/use it at dandi-cli/dandi-archive levels.

Might relate to dandi/dandi-archive#1477 (by @bendichter ) where solution might be to establish regexes for URLs at dandi-schema level so they could be used/validated by both dandi-cli and dandi-archive UIs

@yarikoptic
Copy link
Member Author

i think the difference i am making is that in dandischema such a function could take type of layout as input, but that doesn't need to be in the asset model.

yes. my only thinking is that we might need later some other aspect of knowing dandiset metadata or identity (e.g. to do validation in conjnction with with other assets), and then would need to change it again.

@satra
Copy link
Member

satra commented Feb 13, 2023

why don't we start with copying the regex's over and adding a validation function to the asset model for the asset path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants