Skip to content

Conversation

@yarikoptic
Copy link
Collaborator

Introduce templates.tsv and cohorts.tsv files consistent with existing BIDS entity-specific TSV files (participants.tsv, sessions.tsv).

  • templates.tsv: placed at derivative dataset root, describes tpl- entities
  • cohorts.tsv: placed within tpl-/ directory, describes cohort- entities

This enables consistent documentation of template and cohort entities in derivative datasets. Part of BEP038 atlas metadata improvements.

Schema changes:

  • Add template_id and cohort_id columns to columns.yaml
  • Add templates and cohorts suffixes to suffixes.yaml
  • Add Templates and Cohorts rules to common_derivatives.yaml
  • Add templates and cohorts file rules to tables.yaml
  • Update atlas.md with templates.tsv and cohorts.tsv documentation

Related to:

Part of the larger

for independent consideration. Attn @bids-standard/bep038

@CPernet
Copy link
Collaborator

CPernet commented Dec 19, 2025

I have to say it annoys me to see this after a lot of work to get there .. but damn it works, and it is consistent with other parts of BIDS .. nice +1 👍 I cannot see any reason not to adopt that change @oesteban what do you think?

@oesteban
Copy link
Collaborator

oesteban commented Dec 19, 2025

I have to say it annoys me to see this after a lot of work to get there .. but damn it works, and it is consistent with other parts of BIDS .. nice +1 👍 I cannot see any reason not to adopt that change @oesteban what do you think?

I understand Yarik's points. Also, his proposal is "idempotent" (in the sense that it doesn't create anything that cannot be completed/modified later down the line, as opposed to our _description.json file that once introduced is hard to get rid of).

That said, Yarik's proposal opens a small hole for fields that require long descriptions, such as a custom license (that cannot be expressed with an identifier). Current BEP038 doesn't solve that problem (which is above its scope btw), but having JSON fields to stick that metadata is more reasonable than having them stuffed into the TSV column.

@yarikoptic
Copy link
Collaborator Author

such as a custom license (that cannot be expressed with an identifier). Current BEP038 doesn't solve that problem (which is above its scope btw), but having JSON fields to stick that metadata is more reasonable than having them stuffed into the TSV column.

isn't it a 80/20 rule concern? could likely be "custom: see LICENSE file" for those likely <20% cases?

@yarikoptic
Copy link
Collaborator Author

note that this one alone doesn't solve _descriptions.json for which there is a follow up

@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.81%. Comparing base (074ae6f) to head (2119a13).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2287   +/-   ##
=======================================
  Coverage   82.81%   82.81%           
=======================================
  Files          22       22           
  Lines        1693     1693           
=======================================
  Hits         1402     1402           
  Misses        291      291           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@oesteban
Copy link
Collaborator

such as a custom license (that cannot be expressed with an identifier). Current BEP038 doesn't solve that problem (which is above its scope btw), but having JSON fields to stick that metadata is more reasonable than having them stuffed into the TSV column.

isn't it a 80/20 rule concern? could likely be "custom: see LICENSE file" for those likely <20% cases?

It does apply, 80% of the licenses are custom. Universities used to weigh in and request nonstandard licenses be set.

@yarikoptic
Copy link
Collaborator Author

80% of the licenses are custom.

do we actually have some stats on that?

Universities used to weigh in and request nonstandard licenses be set.

anyways, licence texts better live in files, not json or TSV, we have docs/ so could be docs/licenses/ and pointed there. Moreover, likely all atlases from the same institution/group would have the same (even if custom) license. For a similar reason Debian now ships /usr/share/common-licenses and just refers to their texts in each package copyright file instead of duplicating them all in each package.
FWIW I even see us (me) proposing such a generic placement (docs/licenses/) for license texts and suggested wording for the top level LICENSE file in such a case to refer to detailed/individual licenses under there.

@oesteban
Copy link
Collaborator

oesteban commented Jan 8, 2026

FWIW I even see us (me) proposing such a generic placement (docs/licenses/) for license texts and suggested wording for the top level LICENSE file in such a case to refer to detailed/individual licenses under there.

That sounds very reasonable to me 👍

@effigies effigies changed the base branch from bep038 to master January 8, 2026 13:30
…adata

Introduce templates.tsv and cohorts.tsv files consistent with existing
BIDS entity-specific TSV files (participants.tsv, sessions.tsv).

- templates.tsv: placed at derivative dataset root, describes tpl-<label> entities
- cohorts.tsv: placed within tpl-<label>/ directory, describes cohort-<label> entities

This enables consistent documentation of template and cohort entities in
derivative datasets. Part of BEP038 atlas metadata improvements.

Schema changes:
- Add template_id and cohort_id columns to columns.yaml
- Add templates and cohorts suffixes to suffixes.yaml
- Add Templates and Cohorts rules to common_derivatives.yaml
- Add templates and cohorts file rules to tables.yaml
- Update atlas.md with templates.tsv and cohorts.tsv documentation

Related to:
- #2285
- #2283

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@effigies effigies force-pushed the bep038-cohorts_tsv branch from 5e1ac9b to 9e967fb Compare January 8, 2026 13:31
effigies
effigies previously approved these changes Jan 8, 2026
Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems reasonable. We should update at least one example to use these features.

@yarikoptic
Copy link
Collaborator Author

FWIW I even see us (me) proposing such a generic placement (docs/licenses/) for license texts and suggested wording for the top level LICENSE file in such a case to refer to detailed/individual licenses under there.

That sounds very reasonable to me 👍

on that aspect -- let's agree to use a standard form (I added above docs/licenses/ there now), please review/follow up there:

but overall point for here is that docs/ is there and not structured ATM and thus "open" to absorb any such relevant files without exploding top level files (IMHO should stay compact for the best overview for humans and machines...).

@oesteban
Copy link
Collaborator

oesteban commented Jan 8, 2026

do we actually have some stats on that?

It is my subjective perception that licenses of atlases/templates follow the 80/20 rule (only 20 use standard licenses, and often with no-derivs restrictions, which is nuts for that kind of resource).

Copy link
Collaborator

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks!

`templates.tsv` example:

```tsv
template_id description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template_id description
template_id long_name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly suggest RRID here as a second column before "long_name"

Copy link
Collaborator

@CPernet CPernet Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for RRID, as always columns are optional but we know that when we put things in example they are more adopted than when we do not - might also consider DOI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now description is REQUIRED by the table definition, so either that needs to stay in addition to long_name or it needs to be demoted from REQUIRED. long_name would need to be defined in objects.columns and added to the table definition.

`tpl-MNIPediatricAsym_cohorts.tsv` example:

```tsv
cohort_id description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great 👍

@effigies effigies dismissed their stale review January 9, 2026 20:44

Open discussion points

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ongoing discussion. Cleared my approval until these are wrapped up.

Comment on lines +20 to +21
- template_id
- description__entities
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- template_id
- description__entities
- template_id

description being free-text, I would expect many people to choose to put it last for better alignment. I would drop the initial column requirement.

Comment on lines +24 to +27
description__entities:
level: required
description_addendum: |
The corresponding label column is `template_id`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make rrid show up in the table and also validate that it matches the RRID schema:

Suggested change
description__entities:
level: required
description_addendum: |
The corresponding label column is `template_id`.
description__entities:
level: required
description_addendum: |
The corresponding label column is `template_id`.
rrid: optional

This also requires adding rrid to objects.columns.

cc @oesteban @CPernet

- extension == ".tsv"
initial_columns:
- cohort_id
- description__entities
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think this is overly prescriptive with column ordering.

Suggested change
- description__entities

`templates.tsv` example:

```tsv
template_id description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now description is REQUIRED by the table definition, so either that needs to stay in addition to long_name or it needs to be demoted from REQUIRED. long_name would need to be defined in objects.columns and added to the table definition.

description: |
A TSV file describing labels found for the `cohort` entity within a template.
This file MUST be located within a `tpl-<label>/` directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need an explicit check to validate. Any TSV or JSON file can show up at the root without, even if it doesn't make much sense to do it.

value: templates
display_name: Template Entity Definitions
description: |
A TSV file describing labels found for the `tpl` entity in a Derivatives dataset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird way of talking about it. I would say participants.tsv describes the participants, not the participant labels. Similarly with templates. Cohorts feels more reasonable to say labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants