-
Notifications
You must be signed in to change notification settings - Fork 22
Documentation: examples of 'gold standard' slots as a reference guide for newcomers #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Documentation: examples of 'gold standard' slots as a reference guide for newcomers #943
Conversation
- [`examples`](https://linkml.io/linkml-model/latest/docs/examples/): examples values demonstrating how the slot should be used | ||
- [`in_subset`](https://linkml.io/linkml-model/latest/docs/in_subset/): the section of the schema that the slot belongs based on a [fix list of MIxS categories](https://github.com/GenomicsStandardsConsortium/mixs/blob/609b0f567486f64cb7061246588d8006f87fa138/src/mixs/schema/mixs.yaml#L21-L26) | ||
- Note: this system may be replaced in the near future! | ||
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/): useful keywords to allow grouping of related slots together |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I introduced the keywords in 6.2, I didn't have any tools or policies for adding new keywords. And I still don't! So new terms could be added with no keywords, or with keywords that are similar to but different from the ones I used. Nothing is checking that yet. And we don't really have any easy tooling for retrieving all terms that are tagged with a keyword. That can be done with SchemaView
loops, or by dumping MIxS to a tabular representation, but I don't think we have all agreed on the best way of doing that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based my examples based on the observation that most did have them.
I've unilaterally made up a policy in this regard ;).
Note this isn't the specification document but just examples I found.
We could update the spec to not require them at all, and then drop the reference to it here (even if all the actual examples have them)
|
||
- If a term should be either mandatory or optional | ||
- [`recommended`](https://linkml.io/linkml/schemas/slots.html#recommended): a boolean value indicating if the slot is recommended be filled | ||
- [`required`](https://linkml.io/linkml/schemas/slots.html#required): a boolean value indicating if the slot is mandatory to be filled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that we would never add any new terms with both recommended and required set. There are a few cases of that now that should be cleaned up. It is also possible that a term could be considered recommended in a Checklist and required in an Extension, so it would appear as both in a dynamically constructed combination class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what the "either mandatory or optional" part it.
That said, I think we should review the terms that are present in checklists and extensions... I assume that shouldn't happen and if it does that seems confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in terms of style, I don't think we should ever assert false
. Any Boolean metaslot that isn't asserted is false by definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the conclusion that what is added here is to just add 'recommended: trueif it is recommended etc., and just document you should never specify
recommended: false`?
- If a term should be allowed multiple entries | ||
|
||
- [`multivalued`](https://linkml.io/linkml/schemas/slots.html#multivalued) | ||
- Essentially indicates the contents of the slot can be a list, and each element is evaluated independently against the remaining slot attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure that multivalued slots always have at least two examples in the schema file, on separate lines
slots:
cities:
description: a list of noteworthy cities in some geopolitical entity
examples:
- value: Paris
- value: Lyon
- value: Nice
Not
slots:
cities:
description: a list of noteworthy cities in some geopolitical entity
examples:
- value: Paris|Lyon|Nice
We also need a lot more documentation about how multivalued values are represented in the various data file formats/serializations that LinkML supports. In YAML or JSON the values would truly be broken out into a list
name: France
cities:
- Paris
- Lyon
- Nice
In CSV or TSV that is faithful to the current LinkML specifications, one would see
name | ciites |
---|---|
France | [Paris | Lyon | Nice ] |
Whereas one might expect (or MIxS might imply) that the representation should/can omit the brackets
name | ciites |
---|---|
France | Paris | Lyon | Nice |
or even use a different delimiter
name | ciites |
---|---|
France | Paris ; Lyon ; Nice |
Much of that is under consideration in LinkML, but that kind of flexibility isn't supported yet. This can be seen first hand by creating some data that passes linkml validate
, and then converting it with linkml convert
, a process that I recommend to all MIxS contributors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal, any multivalued slot must have 2 unique examples to show how to capture the information.
Note to keep this guidance focused on that individual that is proposing a new slot to GSC.
This information is important for the individual that is using the standard and entering data.
Maybe a separate section or issue or example file we can link to that would describe how to enter multivalued data. So that when someone is proposing a term, they can see some examples of structure & build their slot according to the need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mslarae13 done here: a7239d2
Indeed, LinkML converts
human_gut_data:
- samp_name: sample1
project_name: project1
special_diet:
- low carb
- vegetarian
- samp_name: sample2
project_name: project1
special_diet:
- low carb
- reduced calorie
to
samp_name project_name special_diet
sample1 project1 [low carb|vegetarian]
sample2 project1 [low carb|reduced calorie]
- [`description`](https://linkml.io/linkml/schemas/metadata.html#providing-descriptions): the description of what the metadata term is for | ||
- [`title`](https://linkml.io/linkml-model/latest/docs/title/): a short human readable 'title' for the slot | ||
- [`examples`](https://linkml.io/linkml-model/latest/docs/examples/): examples values demonstrating how the slot should be used | ||
- [`in_subset`](https://linkml.io/linkml-model/latest/docs/in_subset/): the section of the schema that the slot belongs based on a [fix list of MIxS categories](https://github.com/GenomicsStandardsConsortium/mixs/blob/609b0f567486f64cb7061246588d8006f87fa138/src/mixs/schema/mixs.yaml#L21-L26) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we would want a contributor to determine the subset (categories). I also believe @only1chunts is working on some changes to these. So may be best to take this out, and make a issue to add this when it's cleared up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, just saw the note! :)
So we could keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm basing the current document on what we currently have in the schema, this will of course evolve :)
- Note: this system may be replaced in the near future! | ||
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/): useful keywords to allow grouping of related slots together | ||
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/): a unique ID assigned by MIxS | ||
- This likely only gets assigned upon acceptance and merging by the core GSC MIxS team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This likely only gets assigned upon acceptance and merging by the core GSC MIxS team | |
- This ID will be assigned upon acceptance and merging by the core GSC MIxS team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we're not actually sure how the IDs are minted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should make a tool that reports the highest id so far. I wonder if @sujaypatil96 could help me expose that on the web pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess leaving it to the ultimate repsonsbility of the core MIxS team is then fine (they can always delegate)
|
||
And for some slots, the following attributes are also recommended: | ||
|
||
- If a term should be either mandatory or optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this attribute is required for all slots. But the value may vary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently is not used everywhere, which is why I've put it under recommended.
If we want it to be mandatory from now on, then we can add this to the specification page (if I didn't already add this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
- Note: this system may be replaced in the near future! | ||
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/): useful keywords to allow grouping of related slots together | ||
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/): a unique ID assigned by MIxS | ||
- This likely only gets assigned upon acceptance and merging by the core GSC MIxS team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should make a tool that reports the highest id so far. I wonder if @sujaypatil96 could help me expose that on the web pages.
|
||
- If a term should be either mandatory or optional | ||
- [`recommended`](https://linkml.io/linkml/schemas/slots.html#recommended): a boolean value indicating if the slot is recommended be filled | ||
- [`required`](https://linkml.io/linkml/schemas/slots.html#required): a boolean value indicating if the slot is mandatory to be filled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in terms of style, I don't think we should ever assert false
. Any Boolean metaslot that isn't asserted is false by definition.
This example slot allows a single bit of information in the form of a singular integer value. | ||
|
||
<!-- | ||
JFY comment: I don't like this so much as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfy133 are you saying "let's pick a different slot for the integer example"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. But I didn't find a better alternative (this apples for all examples)
- Why pattern AND structured_pattern? | ||
--> | ||
|
||
### URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structured pattern for this, and I think most slots that allow for URLs is "Structured pattern: ^{PMID}|{DOI}|{URL}$"
So an ID/DOI is also accepted... I wonder if we can refer to this as something else.. source? reference? IDK! But let's expand beyond URL and pick a slot with a URL, DOI, or PMID example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow @mslarae13 , sorry!
Do you mean a slot example with multiple examples? Or you mean update the header/title?
- [structured_pattern](https://linkml.io/linkml/schemas/constraints.html#structured-patterns): a particular regex-like pattern that includes pre-defined components that describe how each component should be formatted | ||
- In the MIxS LinkML schema, these preset formats can be seen under the [`settings`](https://github.com/GenomicsStandardsConsortium/mixs/blob/609b0f567486f64cb7061246588d8006f87fa138/src/mixs/schema/mixs.yaml#L21852-L21887) section of the schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully our structured_patterns won't look regex-like! They could be described as using a domain-specific language, or you could say that they compose the settings like macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm too young (or skipped my excel days), but I've never used/seen a macro 😅 but can update teh text accordingly 👍
<!-- | ||
JFY comment: I don't like this so much as: | ||
|
||
- the description is very minimal, the example value is in quotes | ||
- single example (I like a few, even for very simple terms) | ||
- only one keyword | ||
- no recommended or required | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding quotes: the range of examples.value in LinkML is string, even if the range of the slot being illustrated is numeric or something else. In this case, the example could probably be written with without the quotation marks, but YAML has some idiosyncrasies regarding automatic conversion of unquoted things like dates and Booleans, so I may have added the quotation marks as belt-and-suspenders insurance. Or maybe the conversion script did it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - if it should be everywhere that is fine, I can update accrodingly :) (at least in the specs)
As a relative newcomer myself, one thing I found difficult when trying to prepare new MIxS term proposals was to know how exactly I should write and format the slot in the MIxS preferred way of defining a LinkML slot.
The purpose of this file is to provide examples 'gold standard' slots for common types of metadata terms used within MIxS, that can be used as a template or guides for preparing a new slot proposal.
It is aimed at new users (e.g. intermediate bioinformatics) who are familiar with common programming terms but may not be familiar with MIxS or deep computer science concepts.
Warning
This page is entirely based on the experiences of a novice user, and will likely require heavy editing by experts