Skip to content

Conversation

jfy133
Copy link
Collaborator

@jfy133 jfy133 commented Apr 28, 2025

As a relative newcomer myself, one thing I found difficult when trying to prepare new MIxS term proposals was to know how exactly I should write and format the slot in the MIxS preferred way of defining a LinkML slot.

The purpose of this file is to provide examples 'gold standard' slots for common types of metadata terms used within MIxS, that can be used as a template or guides for preparing a new slot proposal.

It is aimed at new users (e.g. intermediate bioinformatics) who are familiar with common programming terms but may not be familiar with MIxS or deep computer science concepts.

Warning

This page is entirely based on the experiences of a novice user, and will likely require heavy editing by experts

- [`examples`](https://linkml.io/linkml-model/latest/docs/examples/): examples values demonstrating how the slot should be used
- [`in_subset`](https://linkml.io/linkml-model/latest/docs/in_subset/): the section of the schema that the slot belongs based on a [fix list of MIxS categories](https://github.com/GenomicsStandardsConsortium/mixs/blob/609b0f567486f64cb7061246588d8006f87fa138/src/mixs/schema/mixs.yaml#L21-L26)
- Note: this system may be replaced in the near future!
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/): useful keywords to allow grouping of related slots together
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I introduced the keywords in 6.2, I didn't have any tools or policies for adding new keywords. And I still don't! So new terms could be added with no keywords, or with keywords that are similar to but different from the ones I used. Nothing is checking that yet. And we don't really have any easy tooling for retrieving all terms that are tagged with a keyword. That can be done with SchemaView loops, or by dumping MIxS to a tabular representation, but I don't think we have all agreed on the best way of doing that either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based my examples based on the observation that most did have them.

I've unilaterally made up a policy in this regard ;).

Note this isn't the specification document but just examples I found.

We could update the spec to not require them at all, and then drop the reference to it here (even if all the actual examples have them)


- If a term should be either mandatory or optional
- [`recommended`](https://linkml.io/linkml/schemas/slots.html#recommended): a boolean value indicating if the slot is recommended be filled
- [`required`](https://linkml.io/linkml/schemas/slots.html#required): a boolean value indicating if the slot is mandatory to be filled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that we would never add any new terms with both recommended and required set. There are a few cases of that now that should be cleaned up. It is also possible that a term could be considered recommended in a Checklist and required in an Extension, so it would appear as both in a dynamically constructed combination class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what the "either mandatory or optional" part it.
That said, I think we should review the terms that are present in checklists and extensions... I assume that shouldn't happen and if it does that seems confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in terms of style, I don't think we should ever assert false. Any Boolean metaslot that isn't asserted is false by definition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the conclusion that what is added here is to just add 'recommended: trueif it is recommended etc., and just document you should never specifyrecommended: false`?

- If a term should be allowed multiple entries

- [`multivalued`](https://linkml.io/linkml/schemas/slots.html#multivalued)
- Essentially indicates the contents of the slot can be a list, and each element is evaluated independently against the remaining slot attributes
Copy link
Member

@turbomam turbomam May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure that multivalued slots always have at least two examples in the schema file, on separate lines

slots:
    cities:
        description: a list of noteworthy cities in some geopolitical entity
        examples:
            - value: Paris
            - value: Lyon
            - value: Nice

Not

slots:
    cities:
        description: a list of noteworthy cities in some geopolitical entity
        examples:
            - value: Paris|Lyon|Nice

We also need a lot more documentation about how multivalued values are represented in the various data file formats/serializations that LinkML supports. In YAML or JSON the values would truly be broken out into a list

name: France
cities:

  • Paris
  • Lyon
  • Nice

In CSV or TSV that is faithful to the current LinkML specifications, one would see

name ciites
France [Paris | Lyon | Nice ]

Whereas one might expect (or MIxS might imply) that the representation should/can omit the brackets

name ciites
France Paris | Lyon | Nice

or even use a different delimiter

name ciites
France Paris ; Lyon ; Nice

Much of that is under consideration in LinkML, but that kind of flexibility isn't supported yet. This can be seen first hand by creating some data that passes linkml validate, and then converting it with linkml convert, a process that I recommend to all MIxS contributors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal, any multivalued slot must have 2 unique examples to show how to capture the information.
Note to keep this guidance focused on that individual that is proposing a new slot to GSC.
This information is important for the individual that is using the standard and entering data.
Maybe a separate section or issue or example file we can link to that would describe how to enter multivalued data. So that when someone is proposing a term, they can see some examples of structure & build their slot according to the need.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mslarae13 done here: a7239d2

Indeed, LinkML converts

human_gut_data:
  - samp_name: sample1
    project_name: project1
    special_diet:
      - low carb
      - vegetarian
  - samp_name: sample2
    project_name: project1
    special_diet:
      - low carb
      - reduced calorie

to

samp_name	project_name	special_diet
sample1	project1	[low carb|vegetarian]
sample2	project1	[low carb|reduced calorie]

- [`description`](https://linkml.io/linkml/schemas/metadata.html#providing-descriptions): the description of what the metadata term is for
- [`title`](https://linkml.io/linkml-model/latest/docs/title/): a short human readable 'title' for the slot
- [`examples`](https://linkml.io/linkml-model/latest/docs/examples/): examples values demonstrating how the slot should be used
- [`in_subset`](https://linkml.io/linkml-model/latest/docs/in_subset/): the section of the schema that the slot belongs based on a [fix list of MIxS categories](https://github.com/GenomicsStandardsConsortium/mixs/blob/609b0f567486f64cb7061246588d8006f87fa138/src/mixs/schema/mixs.yaml#L21-L26)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we would want a contributor to determine the subset (categories). I also believe @only1chunts is working on some changes to these. So may be best to take this out, and make a issue to add this when it's cleared up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, just saw the note! :)
So we could keep it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm basing the current document on what we currently have in the schema, this will of course evolve :)

- Note: this system may be replaced in the near future!
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/): useful keywords to allow grouping of related slots together
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/): a unique ID assigned by MIxS
- This likely only gets assigned upon acceptance and merging by the core GSC MIxS team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- This likely only gets assigned upon acceptance and merging by the core GSC MIxS team
- This ID will be assigned upon acceptance and merging by the core GSC MIxS team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we're not actually sure how the IDs are minted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should make a tool that reports the highest id so far. I wonder if @sujaypatil96 could help me expose that on the web pages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess leaving it to the ultimate repsonsbility of the core MIxS team is then fine (they can always delegate)


And for some slots, the following attributes are also recommended:

- If a term should be either mandatory or optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this attribute is required for all slots. But the value may vary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently is not used everywhere, which is why I've put it under recommended.

If we want it to be mandatory from now on, then we can add this to the specification page (if I didn't already add this)

Copy link
Member

@turbomam turbomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

- Note: this system may be replaced in the near future!
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/): useful keywords to allow grouping of related slots together
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/): a unique ID assigned by MIxS
- This likely only gets assigned upon acceptance and merging by the core GSC MIxS team
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should make a tool that reports the highest id so far. I wonder if @sujaypatil96 could help me expose that on the web pages.


- If a term should be either mandatory or optional
- [`recommended`](https://linkml.io/linkml/schemas/slots.html#recommended): a boolean value indicating if the slot is recommended be filled
- [`required`](https://linkml.io/linkml/schemas/slots.html#required): a boolean value indicating if the slot is mandatory to be filled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in terms of style, I don't think we should ever assert false. Any Boolean metaslot that isn't asserted is false by definition.

This example slot allows a single bit of information in the form of a singular integer value.

<!--
JFY comment: I don't like this so much as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfy133 are you saying "let's pick a different slot for the integer example"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. But I didn't find a better alternative (this apples for all examples)

- Why pattern AND structured_pattern?
-->

### URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structured pattern for this, and I think most slots that allow for URLs is "Structured pattern: ^{PMID}|{DOI}|{URL}$"

So an ID/DOI is also accepted... I wonder if we can refer to this as something else.. source? reference? IDK! But let's expand beyond URL and pick a slot with a URL, DOI, or PMID example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow @mslarae13 , sorry!

Do you mean a slot example with multiple examples? Or you mean update the header/title?

Comment on lines +51 to +52
- [structured_pattern](https://linkml.io/linkml/schemas/constraints.html#structured-patterns): a particular regex-like pattern that includes pre-defined components that describe how each component should be formatted
- In the MIxS LinkML schema, these preset formats can be seen under the [`settings`](https://github.com/GenomicsStandardsConsortium/mixs/blob/609b0f567486f64cb7061246588d8006f87fa138/src/mixs/schema/mixs.yaml#L21852-L21887) section of the schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully our structured_patterns won't look regex-like! They could be described as using a domain-specific language, or you could say that they compose the settings like macros?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm too young (or skipped my excel days), but I've never used/seen a macro 😅 but can update teh text accordingly 👍

Comment on lines +79 to +86
<!--
JFY comment: I don't like this so much as:

- the description is very minimal, the example value is in quotes
- single example (I like a few, even for very simple terms)
- only one keyword
- no recommended or required
-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding quotes: the range of examples.value in LinkML is string, even if the range of the slot being illustrated is numeric or something else. In this case, the example could probably be written with without the quotation marks, but YAML has some idiosyncrasies regarding automatic conversion of unquoted things like dates and Booleans, so I may have added the quotation marks as belt-and-suspenders insurance. Or maybe the conversion script did it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - if it should be everywhere that is fine, I can update accrodingly :) (at least in the specs)

@jfy133 jfy133 changed the title Document examples of 'gold standard' slots as a reference guide for newcomers Documentation: examples of 'gold standard' slots as a reference guide for newcomers Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants