-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rres-endpoints, writing walkthrough doc [ci skip]
- Loading branch information
1 parent
6ee94a0
commit 62227d1
Showing
3 changed files
with
64 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Used as an example for the walkthrough example at | ||
# /pipeline-walkthrough.md | ||
|
||
# /home/data/knetminer/etl-test/cereals-dummy/cereals-dummy-1.oxl | ||
|
||
# Unfortunately, there isn't consistence, so we can use KETL_DATASET_ID here | ||
oxl_home="$KNET_HOME/etl-test/poaceae/$KETL_DATASET_VERSION" | ||
|
||
export KETL_SRC_OXL="$oxl_home/generic/knowledge-network-free.oxl" | ||
|
||
export KETL_OUT="$KETL_OUT_HOME/$KETL_DATASET_ID/$KETL_DATASET_VERSION" | ||
|
||
## Neo | ||
# | ||
export KETL_HAS_NEO4J=true | ||
export KETL_NEO_VERSION='5.16.0' | ||
export NEO4J_HOME="$KNET_SOFTWARE/neo4j-community-$KETL_NEO_VERSION-etl" | ||
|
||
## Knet Initialiser | ||
# | ||
# The name within the code base, which identifies the config dir to be | ||
# used for the KnetMiner initialiser | ||
export KNET_INIT_DATASET_ID="poaceae-test" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
## Configuration | ||
|
||
As explained in the main [README](README.md), the pipeline can be configured to work with a given dataset ID and a given dataset version (eg, cereals 57), and a given dataset+version can work with a given environment. | ||
|
||
The configuration is hierarchical. Defaults are set by [config/default-cfg.sh](config/default-cfg.sh), which invokes `config/environments/$envNsme-env.sh` (`$envName` is a command line parameter), and then invokes `config/datasets/$datasetId-$version-cfg.sh` (`$datasetId ` and `$version` are command line parameters too). This means that environment-specific config settings can override or extend defaults (by using them) and then dataset-specific config can override/extend either defaults or environment settings. | ||
|
||
Most pipeline scripts invoke (using the Bash [source command](https://www.baeldung.com/linux/source-include-files)) `default-cfg.sh` as a first step. This script has also a special behaviour: it checks the three command line arguments, which must be: `datasetId`, `datasetVersion` and an optional `environmentId`. These parameters are used to find specific config scripts, as said above. | ||
|
||
So, for instance, the dataset building pipeline can be launched this way: | ||
|
||
```bash | ||
# This is where we have the pipeline scripts deployed, we won't repeat this in the examples below | ||
cd /home/data/knetminer/software/knetminer-backend/rres-endpoints | ||
git pull # Optional, this is a mirror of the knetminer-backend repo and you might want to update it | ||
./build-endpoint.sh 'cereals-free' 1 rres # quote datasetId if it contains punctuation | ||
``` | ||
All the scripts that need it, will call the `defeult-cfg.sh`, which will check the CLI arguments and invoke specific config scripts as said above. | ||
|
||
**Tip**: a quick way to see the same variables that the pipeline scripts see is: | ||
|
||
```bash | ||
|
||
``` | ||
|
||
### Dataset config | ||
As explained in the README, for a new dataset, you should define a dataset+version specific config and place it in `config/datasets/$datasetId-$version-cfg.sh`. In our walkthrough example, this is [config/datasets/cereals-dummy-1-cfg.sh](config/datasets/cereals-dummy-1-cfg.sh). | ||
|
||
In this file, `KETL_OUT` defines that all the pipeline output files are rooted at `/home/data/knetminer/pub/endpoints/cereals-dummy/1/`. The value of this depends the previous definition of `KETL_OUT_HOME`, which in turn, depends on KNET_HOME. Both these two vars are defined in the environment config, at [config/environments/rres-env.sh](config/environments/rres-env.sh). | ||
|
||
### Environment configuration | ||
For this walkthrough, we'll use the RRes environment, its shared directories our deployments on them and SLURM, the cluster framework to send batch jobs to high-performant computing hosts and in parallel (more below). | ||
|
||
As per the main README, the config for this environment is at [config/environments/rres-env.sh](config/environments/rres-env.sh). As said above, this defines the pipeline working directory and the path of the input OXL. It also has pointers to software tools such as the OXL-to-RDF exporter or the Neo4j server that the pipeline uses to prepare a Neo dump from the OXL (more below). These tools are pre-installed before running the pipeline. | ||
|