Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace legacy validator with schema validator #337

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,8 @@ jobs:
source activate cubids
conda install -c conda-forge -y datalad

# Add nodejs and the validator
conda install nodejs
npm install -g yarn && \
npm install -g [email protected]
# Add deno to run the schema validator
conda install deno
tsalo marked this conversation as resolved.
Show resolved Hide resolved

# Install CuBIDS
pip install -e .[tests]
Expand Down
16 changes: 13 additions & 3 deletions cubids/cubids.py
Original file line number Diff line number Diff line change
Expand Up @@ -1336,9 +1336,19 @@ def get_all_metadata_fields(self):
found_fields = set()
for json_file in Path(self.path).rglob("*.json"):
if ".git" not in str(json_file):
with open(json_file, "r") as jsonr:
metadata = json.load(jsonr)
found_fields.update(metadata.keys())
# add this in case `print-metadata-fields` is run before validate
try:
with open(json_file, "r", encoding="utf-8") as jsonr:
content = jsonr.read().strip()
if not content:
print(f"Empty file: {json_file}")
continue
metadata = json.loads(content)
found_fields.update(metadata.keys())
except json.JSONDecodeError as e:
print(f"Error decoding JSON in {json_file}: {e}")
except Exception as e:
print(f"Unexpected error with file {json_file}: {e}")
return sorted(found_fields)

def remove_metadata_fields(self, fields_to_remove):
Expand Down
91 changes: 30 additions & 61 deletions cubids/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@

def build_validator_call(path, ignore_headers=False):
"""Build a subprocess command to the bids validator."""
# build docker call
# CuBIDS automatically ignores subject consistency.
command = ["bids-validator", path, "--verbose", "--json", "--ignoreSubjectConsistency"]
# New schema BIDS validator doesn't have option to ignore subject consistency.
# Build the deno command to run the BIDS validator.
command = ["deno", "run", "-A", "jsr:@bids/validator", path, "--verbose", "--json"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want some way to provide a given version of the validator or a version of the BIDS schema


if ignore_headers:
command.append("--ignoreNiftiHeaders")
Expand Down Expand Up @@ -87,32 +87,6 @@ def parse_validator_output(output):
Dataframe of validator output.
"""

def get_nested(dct, *keys):
"""Get a nested value from a dictionary.

Parameters
----------
dct : :obj:`dict`
Dictionary to get value from.
keys : :obj:`list`
List of keys to get value from.

Returns
-------
:obj:`dict`
The nested value.
"""
for key in keys:
try:
dct = dct[key]
except (KeyError, TypeError):
return None
return dct

data = json.loads(output)

issues = data["issues"]

def parse_issue(issue_dict):
"""Parse a single issue from the validator output.

Expand All @@ -126,30 +100,27 @@ def parse_issue(issue_dict):
return_dict : :obj:`dict`
Dictionary of parsed issue.
"""
return_dict = {}
return_dict["files"] = [
get_nested(x, "file", "relativePath") for x in issue_dict.get("files", "")
]
return_dict["type"] = issue_dict.get("key", "")
return_dict["severity"] = issue_dict.get("severity", "")
return_dict["description"] = issue_dict.get("reason", "")
return_dict["code"] = issue_dict.get("code", "")
return_dict["url"] = issue_dict.get("helpUrl", "")

return return_dict

df = pd.DataFrame()

for warn in issues["warnings"]:
parsed = parse_issue(warn)
parsed = pd.DataFrame(parsed)
df = pd.concat([df, parsed], ignore_index=True)

for err in issues["errors"]:
parsed = parse_issue(err)
parsed = pd.DataFrame(parsed)
df = pd.concat([df, parsed], ignore_index=True)
return {
"location": issue_dict.get("location", ""),
"code": issue_dict.get("code", ""),
"subCode": issue_dict.get("subCode", ""),
"severity": issue_dict.get("severity", ""),
"rule": issue_dict.get("rule", ""),
}

# Load JSON data
data = json.loads(output)

# Extract issues
issues = data.get("issues", {}).get("issues", [])
if not issues:
return pd.DataFrame(columns=["location", "code", "subCode", "severity", "rule"])

# Parse all issues
parsed_issues = [parse_issue(issue) for issue in issues]

# Convert to DataFrame
df = pd.DataFrame(parsed_issues)
return df


Expand All @@ -161,12 +132,10 @@ def get_val_dictionary():
val_dict : dict
Dictionary of values.
"""
val_dict = {}
val_dict["files"] = {"Description": "File with warning orerror"}
val_dict["type"] = {"Description": "BIDS validation warning or error"}
val_dict["severity"] = {"Description": "gravity of problem (warning/error"}
val_dict["description"] = {"Description": "Description of warning/error"}
val_dict["code"] = {"Description": "BIDS validator issue code number"}
val_dict["url"] = {"Description": "Link to the issue's neurostars thread"}

return val_dict
return {
"location": {"Description": "File with the validation issue."},
"code": {"Description": "Code of the validation issue."},
"subCode": {"Description": "Subcode providing additional issue details."},
"severity": {"Description": "Severity of the issue (e.g., warning, error)."},
"rule": {"Description": "Validation rule that triggered the issue."},
}
17 changes: 5 additions & 12 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ initialize a new conda environment (for example, named ``cubids``) as follows:

.. code-block:: console

$ conda create -n cubids python=3.8 pip
$ conda create -n cubids python=3.12 pip
$ conda activate cubids

You are now ready to install CuBIDS.
Expand Down Expand Up @@ -44,23 +44,16 @@ Once you have a copy of the source, you can install it with:
$ pip install -e .

We will now need to install some dependencies of ``CuBIDS``.
To do this, we first must install nodejs.
To do this, we first must install deno to run `bids-validator`.
We can accomplish this using the following command:

.. code-block:: console

$ conda install nodejs
$ conda install deno

Now that we have npm installed, we can install ``bids-validator`` using the following command:
The new schema ``bids-validator`` doesn't need to be installed
and will be implemented automatically when `cubids validate` is called

.. code-block:: console

$ npm install -g [email protected]

In our example walkthrough,
we use ``bids-validator`` v1.7.2. using a different version of the
validator may result in slightly different validation tsv printouts,
but ``CuBIDS`` is compatible with all versions of the validator at or above v1.6.2.

We also recommend using ``CuBIDS`` with the optional ``DataLad`` version control capabilities.
We use ``DataLad`` throughout our walkthrough of the CuBIDS Workflow on
Expand Down