Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for refreshing imports #8528

Merged
merged 4 commits into from
Jan 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 35 additions & 10 deletions docs/editors-guide/import-terms-for-logical-axioms.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,62 @@
_Last updated 27-Mar-2024_
_Last updated 2-Jan-2025_

## Import terms into Mondo for use in logical axioms

This workflow is for adding classes from external ontologies (i.e., GO or CHEBI), which is much more streamlined compared to [MIREOTing](https://github.com/obophenotype/human-phenotype-ontology/wiki/Editor-Guide#mireoting).
This workflow is for adding classes from external ontologies (e.g. GO, CHEBI, HGNC, or NCBI) and is much more streamlined compared to [MIREOTing](https://github.com/obophenotype/human-phenotype-ontology/wiki/Editor-Guide#mireoting).

As a Mondo curator, when you have a ticket that requires a term from an external ontology, create a new git feature branch to include _only_ the changes for the refresh of the imports following the steps below. Also, ask on Slack if other curators need other terms imported.
As a Mondo curator, when you have a ticket that requires a term from an external ontology, create a new git feature branch to include _only_ the changes for the refresh of the imports following the steps below. Also, ask on Slack if other curators need other terms imported.

### Prepare your environment
- Set the memory in Docker to 28 GB. See [instructions](/editors-guide/imports/#increase-memory-in-docker-mac-specific-instructions) below on how to change the memory setting.

- Increase the local environment memory to 28 GB by running `export "MEMORY_GB=28"` in your Terminal window.

### Refresh Imports
1. Fetch the latest changes from `master` in the "mondo" repo
1. Create a new git feature branch
1. Open the `src/ontology/imports/manual_seed.txt` file
1. Add the IRI of the term(s) you want to add to this document and save the file
- IRIs for any entity can be added into the `manual_seed.txt` file, e.g. gene, NCBITaxon, etc.
1. In the Terminal, run: `export "MEMORY_GB=15"`
1. Add the IRI of the term(s) you want to add to the ontology to this document (`manual_seed.txt`) and save the file
- IRIs for any entity can be added into the `manual_seed.txt` file, e.g. HGNC gene, NCBI gene (for non-human genes), NCBITaxon, etc.
- The IRIs to add to the `manual_seed.txt` file can be found in the <a href="https://docs.google.com/spreadsheets/d/1UME3pTeR42hwNt1I6RPl0nvJKVfTIeJqhx80laAlD50/edit?pli=1&gid=0#gid=0" target="_blank">terms required for import</a> Google Sheet.
1. In the Terminal, run: `export "MEMORY_GB=28"`
1. Then refresh the imports:
- From `src/ontology/` run the command: `sh run.sh make refresh-merged` (Note: this may take ~2 hours)
- From `src/ontology/` run the command: `sh run.sh make refresh-merged` (Note: takes ~20 minutes)
- All the imports will be updated, which means that you might see changes in your GitHub diff in the following files:
- `src/ontology/imports/*_terms.txt`
- `src/ontology/imports/merged_import.owl`
- The terms added in the `manual_seed.txt` file will be added to the appropriate import file (e.g human genes will be added to hgnc_terms.txt; NCBITaxon will be added to ncbitaxon_terms.txt).
1. Close Protege and open `mondo-edit.obo` in Protege again and use the "Save as..." option under the "File" menu to save the ontology as OBO Format (.obo).
- One needs to save the `mondo-edit.obo` file in order for the updates from the refresh import update process (e.g. updated names) to be visible in the ontology file
- Therefore, changes such as updated names of imported entities might be shown in the git diff.
- The new terms should be available for logical definitions in Protege. Therefore one can also edit the file too, but changes not manually made could be expected (see previous comment).
- The new terms should be available for logical definitions in Protege. Therefore one can also edit the `mondo-edit.obo` file too, but changes not manually made could be expected (see previous comment).
- Example file changes from previous refresh of imports: <a href="https://github.com/monarch-initiative/mondo/pull/7716/files" target="_blank">https://github.com/monarch-initiative/mondo/pull/7716/files</a>
1. Commit the changes to the git feature branch and create a PR.
1. Once the PR is approved and merged, the terms imported from external ontologies can be referenced in logical definitions.

### Note when importing a new NCBITaxon class
### Importing a new NCBITaxon class
- If adding a new NCBITaxon class that is from a species not already found in `src/ncbi_gene/transform.yaml` in the Monarch Initiative [ncbi-gene repo](https://github.com/monarch-initiative/ncbi-gene/blob/main/src/ncbi_gene/transform.yaml), this file also needs to be updated.
- Check if the new NCBI Taxon identifier(s) also exist in the [taxon-subset-ids.txt](https://github.com/obophenotype/ncbitaxon/blob/master/subsets/taxon-subset-ids.txt) file in the "obophenotype/ncbitaxon/subsets" repo
- If the identifiers are not in the file, update the `taxon-subset-ids.txt` file to add the identifiers and create a PR in the "obophenotype/ncbitaxon/subsets" repo to include the new identifiers
_Note_: this file contains CURIEs (not IRIs) so the identifier should be added in this format `NCBITaxon:1`
- This additional step is needed since we are not using NCBI Taxon directly, but the OBO slim, and the [taxon-subset-ids.txt](https://github.com/obophenotype/ncbitaxon/blob/master/subsets/taxon-subset-ids.txt) file is the seed of the NCBITaxon slim.
- This additional step is needed since we are not using NCBI Taxon directly, but the OBO slim, and the [taxon-subset-ids.txt](https://github.com/obophenotype/ncbitaxon/blob/master/subsets/taxon-subset-ids.txt) file is the seed of the NCBITaxon slim.


## Increase memory in Docker (Mac specific instructions)

1. Open Docker Settings
2. Click Resources
3. Increase memory to 28 GB

## Errors due to not enough Memory
Currently (11-Dec-2024) the process requires a minimum of 28 GB of memory. It is possible that the amount of memory needed may increase over time. If the process needs more memory than what is allocated, the import refresh process will stop before completing successfully and you will see at these error lines in your Terminal window:
```
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
...
make[1]: Leaving directory '/work/src/ontology'
make: *** [Makefile:481: refresh-merged] Error 2
```

To resolve an error due to lack of memory, allocate more memory to Docker and your local environment as described above.

## Alternate approaches
While there are alternate approaches to add classes from external ontologies, the instructions above are the only process that should be followed for importing external ontology classes into Mondo.
40 changes: 28 additions & 12 deletions docs/editors-guide/imports.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,46 @@
# Imports
_Last updated 2-Jan-2025_

Updating imports is needed whenever logical axioms that reference external ontologies are added but the classes being referenced have no labels or other logical definitions. Updating an import for those classes will bring in the labels, annotations, and logical axioms into the import file, and therefore into the Mondo ontology.
Updating imports is needed whenever logical axioms that reference external ontologies are added, but the classes being referenced have no labels or other logical definitions. Updating an import for those classes will bring in the labels, annotations, and logical axioms into the import file, and therefore into the Mondo ontology.

# Regenerate import file
## Regenerate import file

See the [imports/ folder](https://github.com/monarch-initiative/mondo/tree/master/imports).

See [Design Pattern](https://mondo.readthedocs.io/en/latest/editors-guide/e-design-patterns/) section for more details on patterns that reference external ontologies and how these are used.

## Instructions

If you have Docker installed (note - you may need to increase your memory in Docker to 24GB):
### Prepare your environment
- Set the memory in Docker to 28 GB. See [instructions](/editors-guide/imports/#increase-memory-in-docker-mac-specific-instructions) below on how to change the memory setting.

- Increase the local environment memory to 28 GB by running `export "MEMORY_GB=28"` in your Terminal window.

### Refresh Imports
1. Navigate to your local ontology directory, for example: `cd src/ontology`
2. Create a git branch, e.g. `git checkout -b iss-GH_ISSUE_NUMBER`
3. Run command: `sh run.sh make refresh-merged`
4. Run `git status` to see what files have been updated
5. Commit the updated files:
`git add <PATH-TO-FILE>`
`git status` - only the updated files should be added and ready to be committed. There will be some untracked files as well, which should not be added or committed.
`git commit`
`git push`
4. Run `git status` to see what files have been updated. Only the updated files should be added and ready to be committed. There may be some untracked files as well, which should not be added or committed.
5. Commit the updated files:
- `git add <PATH-TO-FILE>`
- `git commit`
- `git push`
6. Once the new imports are generated, it could contain newly deprecated classes from the source ontology and this could affect the Mondo ontology by creating danglers/obsolete references. To fix this, follow the instructions in [Repair axioms pointing to deprecated classes](https://mondo.readthedocs.io/en/latest/developer-guide/repair-obsoleted-classes/).

The process to refresh the imports takes ~20 minutes with a memory setting of 28 GB (11-Dec-2024).

## Increase memory in Docker (Mac specific instructions)

1. Open Docker preferences
1. Open Docker Settings
2. Click Resources
3. Increase memory to 24 GB
3. Increase memory to 28 GB

## Errors due to not enough Memory
Currently (11-Dec-2024) the process requires a minimum of 28 GB of memory. It is possible that this limit may increase over time. If the process needs more memory than what is allocated, the import refresh process will stop before completing successfully and you will see at these error lines in your Terminal window:
```
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
...
make[1]: Leaving directory '/work/src/ontology'
make: *** [Makefile:481: refresh-merged] Error 2
```

To resolve an error due to lack of memory, allocate more memory to Docker and your local environment as described above.
Loading