-
Notifications
You must be signed in to change notification settings - Fork 22
Documentation: create a specifications document on how to write a MIxS LinkML slot #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 13 commits
c2c5c26
90c944f
faf5016
f3266b2
a7239d2
e592243
7632c66
9e86e3b
6687982
80c5e0a
224e17b
2446be8
464dc38
80efc3c
878bab2
3986cdc
2363e34
9aef344
a8475b6
1fb3125
76031e9
86f1dd1
80a2879
9a8d932
b6b12d2
b579ab2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,237 @@ | ||
# MIxS term specifications in the LinkML framework | ||
|
||
| Metadata | Value | | ||
| ------------ | ----------------------------- | | ||
| Version | 0.0.1 | | ||
| Last updated | 2025-05-05 | | ||
| Authors | James Fellows Yates (@jfy133) | | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Preamble | ||
|
||
This document describes how MIxS metadata terms are represented within the LinkML framework of the MIxS schema. | ||
|
||
### Terminology | ||
|
||
The key words “MUST”, “MUST NOT”, “SHOULD”, etc. are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). | ||
|
||
This specification documentation refers to both MIxS and LinkML terminology. | ||
The following table can guide readers to how the terminology can be linked. | ||
|
||
| MIxS | LinkML | Description | | ||
| -------------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| Term | `slot` | A single discrete bit of information (metadata) that has various attributes on how this information should be represented and formatted | | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Item | `title` | A short human readable name for the metadata term/slot | | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| MIXS ID | `slot_uri` | | | ||
| Definition | `description` | A detailed human-readable explanation of what information the metadata term/slot should be holding | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need description writing guidelines. I recommend Guidelines for writing definitions in ontologies by Seppälä, Ruttenberg and Smith There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We would need agreement on this from the wider TWG/CIG groups |
||
| Expected value | `range` | The category of metadata the term/slot will hold (text, numbers, etc.) | | ||
| Value syntax | `structured_pattern` | A way of defining how a term/slot should be filled in, e.g. with a specific format or structure | | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Section | `slot_group` | A way of grouping similar or related terms/slots together to assist users in filling metadata tables following a logical progression | | ||
| Section | `subset` | Another way of grouping similar or related terms/slots together to assist users in filling metadata tables following a logical progression | | ||
| Requirement | `recommended` | Specifying the whether a term is optional but should be to be filled in for a sample | | ||
| Requirement | `required` | Specifying the whether a term is mandatory to be filled in for a sample | | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Occurrence | `multivalued` | The number of times a particular term/slot can be used for a specific sample | | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This document will generally use MIxS terminology, but where helpful more relevant use the LinkML equivalent, with the other form in parentheses afterwards. | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## 1. General | ||
|
||
### 1.1 LinkML compatibility | ||
|
||
A MIxS term (slot) MUST be written in and compatible with the [LinkML](https://linkml.io/) model, and any of it's requirements (e.g. in YAML format). | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
It MUST conform to any MIxS specific LinkML linting requirements as defined within the [MIxS GitHub repository](https://github.com/GenomicsStandardsConsortium/mixs). | ||
|
||
### 1.2 Slot definition | ||
|
||
A LinkML slot is the object that is used to describe a MIxS term - i.e. information that is used to describe a particular aspect of a sample, nucleic acid,, or sequence data. | ||
|
||
### 1.2 Language | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
All MIxS terms attributes MUST be written in English. | ||
|
||
<!-- JFY comment: may be being a bit strict here, I guess you could have 'translated name' column or something like that, should rephrase to allow those exceptions --> | ||
|
||
## 2. Term naming | ||
|
||
### 2.1 Term name format | ||
|
||
The term (slot) name MUST be in [snake_case](https://en.wikipedia.org/wiki/Snake_case). | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
All words must be lower case and underscores (`_`) MUST be used to separate words in the slot name. | ||
|
||
### 2.1 Term name length | ||
|
||
The term (slot) name must be a maximum of 20 characters in length. | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### 2.4 Term name uniqueness | ||
|
||
The term (slot) name MUST be unique within the MIxS LinkML model. | ||
|
||
### 2.5 Term name descriptiveness | ||
|
||
The term (slot) name MUST be descriptive of the data it is intended to hold. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs examples and counterexamples. See the Seppälä paper above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would need someone to give me examples in the MIxS flavour! (I will read the paper when I have a chance) |
||
|
||
### 2.6 Term name common prefix | ||
|
||
When related to existing terms, the term (slot) name SHOULD use a common prefix that allow grouping of related terms. | ||
|
||
### 2.7 Term name abbreviation | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The term (slot) name SHOULD be a abbreviated form of the item (title) attribute. | ||
|
||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## 4. Data types | ||
|
||
### 4.1 Data types must be valid LinkML types | ||
|
||
The data or information a term (slot) encodes MUST be in the form of a valid LinkML `range:` type: | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- `string` | ||
- `integer` | ||
- `float` | ||
- `boolean` | ||
- A MIxS defined enumeration | ||
|
||
Refer to LinkML documentation for more information on [range types](https://linkml.io/linkml-model/latest/docs/range/). | ||
|
||
## 3 Slot attributes | ||
|
||
### 3.1. Minimal required LinkML slot attributes | ||
|
||
A term (slot) MUST at a minimum include following attributes: | ||
|
||
- [`description`](https://linkml.io/linkml/schemas/metadata.html#providing-descriptions). | ||
- [`title`](https://linkml.io/linkml-model/latest/docs/title/). | ||
- [`examples`](https://linkml.io/linkml-model/latest/docs/examples/). | ||
- [`in_subset`](https://linkml.io/linkml-model/latest/docs/in_subset/). | ||
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/). | ||
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/). | ||
- [`range`](https://linkml.io/linkml/schemas/slots.html#ranges). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. two points on section 4.1: |
||
### 3.2. Recommended LinkML slot attributes | ||
|
||
A term (slot) that has some level of 'requirement' (mandatory, conditional mandatory, optional) SHOULD include the following LinkML attributes: | ||
|
||
- [`recommended`](https://linkml.io/linkml/schemas/slots.html#recommended) | ||
- [`required`](https://linkml.io/linkml/schemas/slots.html#required) | ||
|
||
## 4. Term definition attribute | ||
|
||
### 4.1 Description contents | ||
|
||
The definition (description) SHOULD aim to be precise enough for a user to understand the data the term (slot) is intended to hold, how it should be filled, and used. | ||
|
||
Links to external resources (e.g. ontologies, databases, or other documentation) SHOULD be included in the definition (description) when relevant. | ||
|
||
### 4.2 Description length | ||
|
||
The definition (description) MUST be at a minimum 1 sentence long that is longer than the term (slot) title. | ||
|
||
The definition (description) MAY be multiple sentences long, but should be as concise as possible to ensure readability. | ||
|
||
### 4.3 Description examples | ||
|
||
The definition (description) SHOULD NOT include basic examples of the data the term (slot) is intended to hold (this is covered by the `examples` attribute). | ||
|
||
The definition (description) MAY include examples when the information for the term (slot) requires different formatting depending on certain conditions. The definition (description) MAY also include examples when it requires additional understanding that cannot be inferred by looking purely at the `examples` section. | ||
|
||
### 4.4 Description external resources | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Links or URLs used in the definition (description) to point a reader to an external resource MUST be valid and generally accessible via the public world wide web. | ||
|
||
External resources SHOULD only be referred to when from a stable and established resource (i.e., not a personal or website, or a resource that is not widely used). | ||
|
||
## 5. Term item attribute | ||
|
||
### 5.1 Title contents | ||
|
||
The item (title) should be a full sentence version of the term (slot) name, and MUST be descriptive of the data it is intended to hold. | ||
|
||
### 5.2 Title length | ||
|
||
A term (slot) item (title) attribute SHOULD be as short as possible, but as long as necessary to be sufficiently descriptive, unique, and distinguishable from other terms. | ||
|
||
### 5.2 Title format | ||
|
||
The item (title) SHOULD be lower case, including first character of the item. | ||
|
||
- Valid example: `library size`. | ||
- Invalid examples: | ||
- `Library size` (capitalisation of first character). | ||
- `Library Size` (capitalisation of of all words). | ||
|
||
Capitalisation MAY be used when it is an acronym or abbreviation that typically used capitalisation in the English language (e.g. `DNA`, `API`, `pH`). | ||
|
||
- Valid example: `MAG coverage software`. | ||
- Valid example: `API gravity`. | ||
|
||
## 6. Slot examples attribute | ||
|
||
<!-- JFY comment: this is a new guideline I would like to propose, so requires discussion --> | ||
|
||
### 6.1 Minimum number of examples | ||
|
||
There MUST have minimum of 3 examples for a term (slot). | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### 6.2 Scope of examples | ||
|
||
In addition to the minimum number of examples, the examples SHOULD cover the full range of possible values, string formats, or any other way that information can be given to the term (slot). | ||
|
||
For example if a term (slot) accepts either an ontology term _or_ a free text string, there should be at least one example for each type. | ||
If a term (slot) accepts different unit types, there should be at least two examples of different units to demonstrate multiple units are accepted. | ||
|
||
### 6.3 Examples for terms that allow more than one entry | ||
|
||
If a term (slot) allows multiple occurrences ('multivalued'), the examples MUST include at a minimum two examples, one to show inputting a single value, and another to show how to fill the term with multiple values. | ||
|
||
## 7. Slot in_subset attribute | ||
|
||
> [!WARNING] | ||
> The guidance in this section regarding `subset`s may be replaced with the use of `slot-group` in the future. | ||
|
||
## 7.1 All core slots must be assigned a subset | ||
|
||
All core checklist LinkML slots (terms) MUST be assigned to a section (subset). | ||
|
||
## 7.2 All extension terms must not be assigned a subset | ||
|
||
A slot (term) assigned to just an extension MUST NOT be assigned to a section (subset). | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## 8. Term keywords attribute | ||
|
||
### 8.1 Number of keywords | ||
|
||
All term (slots) MUST have at least one keyword. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to discuss alignment and maintenance of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. personally I dislike keywords in general as they are by definition redundant, i.e. if its a key feature of the term then it should be in the name/title and or description. They are not useful for grouping things because they are uncontrolled free text, they are subjective based on the opinions of the person adding them and they are not specified to any particular level of granularity. But I'll step off my soap box now, rather than saying MUST I would suggest we use COULD. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I honestly agree, I never saw the purpose of them given there are I could remove this for now as a specification and (leaving it entirely optional) unless @turbomam gives a useful use case and a way to standardise (in which case we keep it in, and bring it up for discussion with CIG/TWG) |
||
|
||
### 8.3 Keywords should be re-used | ||
|
||
Re-using existing keywords SHOULD be preferred, but new keywords MAY be created if needed. | ||
|
||
### 8.2 Keyword types | ||
|
||
Keywords SHOULD be descriptive of the data the term is intended to hold in a way it can be grouped with with other terms (slots). | ||
|
||
This can correspond to stage of project, domain of research, or the sample type (or extension) the term is intended to be used with. | ||
|
||
It MAY ALSO include each descriptive part of the title (item) in full words (e.g. `air_temp` could have keywords `air` and `temperature`). | ||
|
||
## 9. Term MIXS ID attribute | ||
|
||
### 9.1 MIXS ID requirement | ||
|
||
The term MUST have a MIXS ID (slot_uri) that is unique within the MIxS model. | ||
|
||
### 9.2 MIXS ID format | ||
|
||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The MIXS ID (slot_uri) must begin with the string `MIXS`, a colon, and followed by a 7 digit number. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should discuss who assigns these and when. Ideally, there would be no other system of record other than the schema file, so there could never be any conflicts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lschriml said at the GSC25 MIxS working day that currently this is specified on just prior release after a feature freeze for all new terms. She was interest in automation of this though! |
||
|
||
## 10. Slot range attribute | ||
|
||
### 10.1 Range options should be valid LinkML types | ||
|
||
See section [4](#4-data-types). | ||
|
||
### 10.2 Structured or formatted text should use a structured pattern | ||
|
||
A term that requires a specific value syntax or a structured string layout SHOULD use the `structured_pattern` slot attribute, where the pattern components are predefined in the `settings:` section of the schema. | ||
|
||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
A slot MAY use `pattern:` attribute when XYZ <!-- TODO -->. |
Uh oh!
There was an error while loading. Please reload this page.