Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to reserve ontology term IDs #1

Open
jamesaoverton opened this issue Feb 17, 2020 · 2 comments · May be fixed by #2
Open

Script to reserve ontology term IDs #1

jamesaoverton opened this issue Feb 17, 2020 · 2 comments · May be fixed by #2
Assignees

Comments

@jamesaoverton
Copy link
Member

jamesaoverton commented Feb 17, 2020

When adding a new term to an ontology, we need to give it a new ID. IDs are usually numeric and sequential, e.g. OBI:0000070. Because people may be working on multiple branches in parallel, the term reservations have to live outside the development branches of the ontology repository.

Here's an example where we started coordinating OBI term IDs via Google Sheet:

https://docs.google.com/spreadsheets/d/1tpDrSiO1DlEqkvZjrDSJrMm7OvH9GletljaR-SDeMTI

The important information is:

  • term ID
  • term label
  • developer
  • date
  • comment, preferably with a GitHub issue or PR number

The main drawbacks of a Google Sheet are (1) a separate username/authentication mechanism from GitHub, and (2) it's a pain to write an authenticated script to add a new request.

So I would prefer to use GitHub. There should be a reserved-terms.txt file on a special branch of the repo named term-ids. Each line of the file should start with a term ID (e.g. OBI:0012345) followed by a space and the label. The commit message should include a comment with GitHub issue or PR number (e.g. #1234). The commit will record the date and the username. The git blame view will then show all the important information above. Users can edit reserved-terms.txt file manually using the GitHub web interface.

There can also be a published-terms.txt file, with the same format, listing all the officially published term IDs for the ontology (e.g. OBI:0000070 assay), in the same format as reserved-terms.txt.

To supplement manual edits, I want a reserve-terms.py script that will:

  1. read the published-terms.txt file from GitHub
  2. read the reserved-terms.txt file from GitHub
  3. figure out the next available ID
  4. either:
    A. accept a command line argument one or more new term labels, or
    B. read a local file containing one or more new term labels: one label per line
  5. check that the requested labels are not already present in published-terms.txt or reserved-terms.txt; if any are present, print a helpful error and quit
  6. assign new IDs for each new label, and print the IDs and labels to STDOUT
  7. append lines to reserved-terms.txt -- in memory, don't rely on local files
  8. either
    A. accept a commit message as a command-line argument, or
    B. prompt the user for a commit message
  9. Commit the change to GitHub using by
    A. using an OAuth token from the environment, or
    B. prompting the user for a GitHub username and password, or
    C. something more clever?
  10. Print a link to the commit

I don't want this script to use the git CLI or checkout files locally. I'd like it to keep published-terms.txt and reserved-terms.txt in memory.

@jamesaoverton
Copy link
Member Author

For item 9, I'd like this to work with a either personal access token or an OAuth App token. I think the two kinds of token just work the same as far as making GitHub API calls, but I don't know for sure.

@jamesaoverton
Copy link
Member Author

I coordinated with Nico about reusing the ODK config files, and he's in favour. I just have to spend another minute thinking about this.

INCATools/ontology-development-kit#328

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants