Skip to content
Open
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c2c5c26
Create slot_specifications.md
jfy133 Apr 28, 2025
90c944f
Update slot_specifications.md
jfy133 Apr 28, 2025
faf5016
Complete all sections of an (unoffiical) metadata slot specification
jfy133 May 5, 2025
f3266b2
Additional tweaks
jfy133 May 5, 2025
a7239d2
Add point about multivalue examples
jfy133 May 12, 2025
e592243
Tweak example of the multivalued slot
jfy133 May 12, 2025
7632c66
Experiment with preview building
jfy133 May 12, 2025
9e86e3b
Delete .github/workflows/test_pages_build.yaml
jfy133 May 12, 2025
6687982
Update a couple of places based on feedback from @only1chunts
jfy133 Jun 20, 2025
80c5e0a
Merge branch 'docs-slot-specifications' of github.com:jfy133/genomics…
jfy133 Jun 20, 2025
224e17b
Second sentence tweak
jfy133 Jun 20, 2025
2446be8
Add terminology mapping table, and start updating text (continue from…
jfy133 Jun 25, 2025
464dc38
Complete text update for using both MIxS and LinkML terminology
jfy133 Jul 14, 2025
80efc3c
Merge branch 'GenomicsStandardsConsortium:main' into docs-slot-specif…
jfy133 Aug 11, 2025
878bab2
Changes after @turbomam review, added missing sections on occurance …
jfy133 Aug 11, 2025
3986cdc
Merge branch 'docs-slot-specifications' of github.com:jfy133/genomics…
jfy133 Aug 11, 2025
2363e34
Update src/docs/slot_specifications.md
jfy133 Aug 11, 2025
9aef344
Reformat
jfy133 Aug 11, 2025
a8475b6
Addm issing MIxS ID reference in mapping table
jfy133 Aug 11, 2025
1fb3125
Update src/docs/slot_specifications.md
jfy133 Aug 12, 2025
76031e9
Add link to linkml model docs for see also
jfy133 Aug 12, 2025
86f1dd1
Few more corrections after @only1chunts's review and fix header numbe…
jfy133 Aug 12, 2025
80a2879
Add example mapping to terminolgoy table
jfy133 Aug 12, 2025
9a8d932
Fix environment subset guidance
jfy133 Aug 12, 2025
b6b12d2
Correct guidance for specifying units
jfy133 Aug 12, 2025
b579ab2
Improve definition
jfy133 Aug 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
317 changes: 317 additions & 0 deletions src/docs/slot_specifications.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,317 @@
# MIxS term specifications in the LinkML framework

| Metadata | Value |
| ---------------- | ----------------------------------------------------------------------------------- |
| Version | 0.0.1 |
| Last updated | 2025-05-05 |
| Document Authors | James Fellows Yates (@jfy133), Mark Miller (@turbomam), Chris Hunter (@only1chunts) |

## Preamble

This document describes how MIxS metadata terms are represented within the LinkML framework of the MIxS schema.

### Terminology

The key words “MUST”, “MUST NOT”, “SHOULD”, etc. are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

This specification documentation refers to both MIxS and LinkML terminology.
The following table can guide readers to how the terminology can be linked.

| MIxS | LinkML | Description |
| ----------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Term | `slot` | A single field of information (metadata) that has various attributes on how this information should be represented and formatted |
| Structured comment name | `name` | A short computer compatible key or ID for a given metadata field that is used to refer to the particular term (typically) within the schema internally |
| Item | `title` | A short human readable name for the metadata term/slot |
| MIxS ID | `slot_uri` | The resolvable globally unique persistent identifier associated with a MIxS metadata field with the prefix 'MIXS' that expands to https://w3id.org/gensc/mixs/ |
| Definition | `description` | A detailed human-readable explanation of what information the metadata field should be holding |
| Expected value | `range` | The category of metadata the metadata field will hold (text, numbers, etc.) |
| Value syntax | `structured_pattern` | A way of defining how the metadata field should be filled in, e.g. with a specific format or structure |
| Example | `examples` | Examples of values for an item, i.e., different examples how metadata field should be filled in |
| Section | `slot_group` | A way of grouping similar or related metadata fields together to assist users in filling metadata tables following a logical progression |
| Section | `subset` | Another way of grouping similar or related metadata fields together to assist users in filling metadata tables following a logical progression |
| Requirement | `recommended` | Specifying the whether a metadata field is optional but should be to be filled in for a sample |
| Requirement | `required` | Specifying the whether a metadata field is mandatory to be filled in for a sample |
| Occurrence | `multivalued` | The number of times a particular metadata field can be used for a specific sample |

This document will generally use MIxS terminology, but where helpful more relevant use the LinkML equivalent, with the other form in parentheses afterwards.

## 1. General

### 1.1 LinkML compatibility

A MIxS term (slot) MUST be written in and compatible with the [LinkML](https://linkml.io/) model, and any of it's requirements (e.g. in YAML format).

It MUST conform to any MIxS specific LinkML linting requirements as defined within the [MIxS GitHub repository](https://github.com/GenomicsStandardsConsortium/mixs).

### 1.2 Slot definition

A LinkML slot is the object that is used to describe a MIxS term - i.e. information that is used to describe a particular aspect of a sample, nucleic acid,, or sequence data.

### 1.3 Language

All MIxS terms attributes MUST be written in English.

<!-- JFY comment: may be being a bit strict here, I guess you could have 'translated name' column or something like that, should rephrase to allow those exceptions -->

## 2. Term structured naming

### 2.1 (Structured comment) name format

The term (slot) structured comment name (`name`) MUST be in [snake_case](https://en.wikipedia.org/wiki/Snake_case).

All words must be lower case and underscores (`_`) MUST be used to separate words in the slot name.

### 2.2 (Structured comment) name length

The term (slot) structured comment name (`name`) must be a maximum of 20 characters in length as per INSDC guidelines ([https://www.insdc.org/submitting-standards/feature-table/#3.1](https://www.insdc.org/submitting-standards/feature-table/#3.1)).

### 2.3 (Structured comment) name uniqueness

The term (slot) structured comment name (`name`) MUST be unique within the MIxS LinkML model.

### 2.4 (Structured comment) name descriptiveness

The term (slot) structured comment name (`name`) MUST be descriptive of the data it is intended to hold.

### 2.5 (Structured comment) name abbreviation

The term (slot) structured comment name (`name`) SHOULD be a abbreviated form of the item (title) attribute.

Examples:

| Term Item / `title` | Structured comment name / `name` |
| ----------------------------------------------- | -------------------------------- |
| geographic location (country and/or sea,region) | `geo_loc_name` |
| isolation and growth condition | `isol_growth_condt` |
| pcr conditions | `pcr_cond` |
| sample volume or weight for DNA extraction | `samp_vol_we_dna_ext` |
| collection site geographic feature | `coll_site_geo_feat` |

### 2.6 (Structured comment) name common prefix

When related to existing terms, the term (slot) structured comment name (`name`) SHOULD use a common prefix that allow grouping of related terms.

Examples:

- Terms related to `sample` should use the prefix `samp_`.

| Term Item / `title` | Structured comment name / `name` |
| -------------------------------- | -------------------------------- |
| sample storage temperature | `samp_store_temp` |
| sample storage duration | `samp_store_dur` |
| sample volume for DNA extraction | `samp_vol_we_dna_ext` |

- Terms related to assembly metadata term (slots) should use the prefix `assembly_`.

| Term Item / `title` | Structured comment name / `name` |
| ---------------------------- | -------------------------------- |
| name and version of assembly | `assembly_name` |
| assembly software | `assembly_software` |
| assembly quality | `assembly_qual` |

## 3. Term expected value types

### 3.1 Term expected value must be valid LinkML range types

The type of data specified in the expected value (`range`) of a slot (term) MUST be in the form of a valid LinkML `range` type:

- `string`
- `integer`
- `float`
- `boolean`
- A MIxS defined enumeration

Refer to LinkML documentation for more information on [range types](https://linkml.io/linkml-model/latest/docs/range/).

## 4. Slot attributes

### 4.1. Minimal required LinkML slot attributes

A term (slot) MUST at a minimum include following attributes:

- [`description`](https://linkml.io/linkml/schemas/metadata.html#providing-descriptions).
- [`title`](https://linkml.io/linkml-model/latest/docs/title/).
- [`examples`](https://linkml.io/linkml-model/latest/docs/examples/).
- [`in_subset`](https://linkml.io/linkml-model/latest/docs/in_subset/).
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/).
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/).
- [`range`](https://linkml.io/linkml/schemas/slots.html#ranges).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two points on section 4.1:
a. Should hte structured comment name be included in the list of required attributes? technically I guess its not an attribute in LinkML speak because its the slot identifier, but it is required.
b. For completeness can we include the MIxS terminology in the list as well?

### 4.2. Recommended LinkML slot attributes

A term (slot) that has some level of 'requirement' (mandatory, conditional mandatory, optional) SHOULD include the following LinkML attributes:

- [`recommended`](https://linkml.io/linkml/schemas/slots.html#recommended)
- [`required`](https://linkml.io/linkml/schemas/slots.html#required)

## 5. Term definition

### 5.1 Definition contents

The definition (description) SHOULD aim to be precise enough for a user to understand the data the term (slot) is intended to hold, how it should be filled, and used.

Links to external resources (e.g. ontologies, databases, or other documentation) SHOULD be included in the definition (description) when relevant.

### 5.2 Definition length

The definition (description) MUST be at a minimum 1 sentence long that is longer than the term (slot) title.

The definition (description) MAY be multiple sentences long, but should be as concise as possible to ensure readability.

### 5.3 Definition examples

The definition (description) SHOULD NOT include basic examples of the data the term (slot) is intended to hold (this is covered by the `examples` attribute).

The definition (description) MAY include examples when the information for the term (slot) requires different formatting depending on certain conditions. The definition (description) MAY also include examples when it requires additional understanding that cannot be inferred by looking purely at the `examples` section.

### 5.4 Definition external resources

Links or URLs used in the definition (description) to point a reader to an external resource MUST be valid and generally accessible via the public world wide web.

External resources SHOULD only be referred to when from a stable and established resource (i.e., not a personal or website, or a resource that is not widely used).

URLs in external resources specified within descriptions SHOULD also be defined within a LinkML [`see_also` slot attribute](https://linkml.io/linkml-model/latest/docs/see_also/).

## 6. Term item title attribute

### 6.1 Title contents

The item (title) should be a full sentence version of the term (slot) name, and MUST be descriptive of the data it is intended to hold.

### 6.2 Title length

A term (slot) item (title) attribute SHOULD be as short as possible, but as long as necessary to be sufficiently descriptive, unique, and distinguishable from other terms.

### 6.3 Title format

The item (title) SHOULD be lower case, including first character of the item.

- Valid example: `library size`.
- Invalid examples:
- `Library size` (capitalisation of first character).
- `Library Size` (capitalisation of of all words).

Capitalisation MAY be used when it is an acronym or abbreviation that typically used capitalisation in the English language (e.g. `DNA`, `API`, `pH`).

- Valid example: `MAG coverage software`.
- Valid example: `API gravity`.

## 7. Term examples attribute

<!-- JFY comment: this is a new guideline I would like to propose, so requires discussion -->

### 7.1 Minimum number of examples

There MUST have minimum of 1 examples for a term (slot).
Ideally, there SHOULD be a minimum of 3 examples for a term (slot).

### 7.2 Scope of examples

Examples SHOULD cover the full range of possible values, string formats, or any other way that information can be given to the term (slot).

For example if a term (slot) accepts either an ontology term _or_ a free text string, there should be at least one example for each type.
If a term (slot) accepts different unit types, there should be at least two examples of different units to demonstrate multiple units are accepted.

### 7.3 Examples for terms that allow more than one entry

If a term (slot) allows multiple occurrences ('multivalued'), the examples MUST include at a minimum two examples, one to show inputting a single value, and another to show how to fill the term with multiple values.

## 8. Term section attribute

> [!WARNING]
> The guidance in this section regarding `subset`s may be replaced with the use of `slot-group` in the future.

### 8.1 All core terms must be assigned a subset

All core checklist terms (slot) MUST be assigned to a section (subset).

### 8.2 All extension terms must be assigned the environment subset

A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset).

## 9. Term keywords attribute

### 9.1 Number of keywords

All term (slots) MUST have at least one keyword.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss alignment and maintenance of keywords

Copy link
Member

@only1chunts only1chunts Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I dislike keywords in general as they are by definition redundant, i.e. if its a key feature of the term then it should be in the name/title and or description. They are not useful for grouping things because they are uncontrolled free text, they are subjective based on the opinions of the person adding them and they are not specified to any particular level of granularity. But I'll step off my soap box now, rather than saying MUST I would suggest we use COULD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly agree, I never saw the purpose of them given there are sections/subsets already...

I could remove this for now as a specification and (leaving it entirely optional) unless @turbomam gives a useful use case and a way to standardise (in which case we keep it in, and bring it up for discussion with CIG/TWG)


### 9.2 Keywords should be re-used

Re-using existing keywords SHOULD be preferred, but new keywords MAY be created if needed.

### 9.3 Keyword types

Keywords SHOULD be descriptive of the data the term is intended to hold in a way it can be grouped with with other terms (slots).

This can correspond to stage of project, domain of research, or the sample type (or extension) the term is intended to be used with.

It MAY ALSO include each descriptive part of the title (item) in full words (e.g. `air_temp` could have keywords `air` and `temperature`).

## 10. Term MIxS ID attribute

### 10.1 MIxS ID requirement

The term MUST have a MIxS ID (slot_uri) that is unique within the MIxS model.

### 10.2 MIxS ID format

The MIxS ID (slot_uri) must begin with the string `MIXS`, a colon, and followed by a 7 digit number.

Example: `MIXS:0000010`.

> [!NOTE]
> MIxS IDs are only able to be assigned by the GSC's Compliance and Integration Working Group (CIG).

## 11. Slot range attribute

### 11.1 Range options should be valid LinkML types

See section [4](#4-data-types).

### 11.2 Structured or formatted text should use a structured pattern

A term that requires a specific value syntax or a structured string layout SHOULD use the `structured_pattern` slot attribute, where the pattern components SHOULD be predefined in the `settings:` section of the schema when theoretically could be used more than once.

A slot MAY use `pattern:` attribute when XYZ <!-- TODO -->.

### 11.3 Structured or formatted text components should be reused

A structured pattern SHOULD re-use existing pattern components when as far as possible.

Additional pattern components MAY be created when needed after consultation with the GSC's Compliance and Integration Working Group (CIG).

### 11.4 Specifying units

Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkmL `annotation` slot sub-attribute called `Preferred_unit:`.

Example:

```yaml
annotations:
Preferred_unit: degree Celsius
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means? I thought we were encouraging the use of the 'preferred_unit:' linkml field for the unit requirements, which is also a MIxS word and should be included in the table at the top.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is preferred_unit a LinkML thing:? It's not specified anywhere in the existing mixs.yaml 🤔

I also don't find a refernce to it in the linkml docs: https://linkml.io/linkml/search.html?q=preferred_unit&check_keywords=yes&area=default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you search for the word "preferred_unit" in the mixs.yaml file you will see 238 instances of it. If you look in the old v6 excel spreadsheet there is a column for "Preferred unit". I dont know if its from the off-the-shelf linkml or something thats been added specificly for our use case, but its definately there.

Copy link
Collaborator Author

@jfy133 jfy133 Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm aware of it in the MIxS Excel spreadsheet/original MIxS, but maybe I'm being crazy but I'm not finding it in the schema...:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that: it's spelt Preferred_unit (why the captial!?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, but I find it sort of unsatisfying (also the way the preferred units have been specified is very messy, with some having loads of different 'preferred' units... which sort of defeats the point...

e.g.

  abs_air_humidity:
    annotations:
      Preferred_unit: gram per gram, kilogram per kilogram, kilogram, pound, gram per cubic meter, kilogram per cubic meter, percent

## 12. Multiple occurrence

A term (slot) that allows multiple values for a single sample SHOULD be specified by setting the LinkML `multivalued` boolean to `true`.

## 13. Level of requirement

### 13.1 Mandatory terms

A term (slot) that is required to be filled in for a sample MUST have the `required` attribute set to `true`.

### 13.2 Conditional mandatory terms

A conditional term (slot) SHOULD NOT be specified as `required` as a LinkML slot attribute.
A conditional term (slot) SHOULD be specified within the `slot_usage` attribute of a LinkML class attribute for a given extension.

### 13.3 Environment dependent terms

An environment dependent term (slot) SHOULD NOT be specified as `required` as a LinkML slot attribute.
An environment dependent term (slot) SHOULD be specified within the `slot_usage` attribute of a LinkML class attribute for a given extension.

### 13.4 Optional terms

A term (slot) that is not required for a given sample MUST NOT have either the `recommended` and `required` LinkML attributes specified.
By default LinkML attributes are assumed `false` unless specified.
Loading