Skip to content

Commit

Permalink
Writing rres-endpoints/doc/pipeline-walkthrough.md
Browse files Browse the repository at this point in the history
  • Loading branch information
marco-brandizi committed Apr 24, 2024
1 parent 62227d1 commit a428929
Show file tree
Hide file tree
Showing 24 changed files with 2,403 additions and 40 deletions.
9 changes: 6 additions & 3 deletions rres-endpoints/config/datasets/cereals-dummy-1-cfg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@
# /home/data/knetminer/etl-test/cereals-dummy/cereals-dummy-1.oxl

# Unfortunately, there isn't consistence, so we can use KETL_DATASET_ID here
oxl_home="$KNET_HOME/etl-test/poaceae/$KETL_DATASET_VERSION"
oxl_home="$KNET_HOME/etl-test/cereals-dummy"

export KETL_SRC_OXL="$oxl_home/generic/knowledge-network-free.oxl"
export KETL_SRC_OXL="$oxl_home/$KETL_DATASET_ID-$KETL_DATASET_VERSION.oxl"

export KETL_OUT="$KETL_OUT_HOME/$KETL_DATASET_ID/$KETL_DATASET_VERSION"

## Neo 
## Neo
# See default-cfg.sh for details.
#
export KETL_HAS_NEO4J=true
export KETL_NEO_VERSION='5.16.0'
Expand All @@ -21,3 +22,5 @@ export NEO4J_HOME="$KNET_SOFTWARE/neo4j-community-$KETL_NEO_VERSION-etl"
# The name within the code base, which identifies the config dir to be
# used for the KnetMiner initialiser
export KNET_INIT_DATASET_ID="poaceae-test"

# TODO: more to be added.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# These files are passed to the Ondex Metadata descriptor tool.
# https://github.com/Rothamsted/knetbuilder/tree/master/ondex-knet-builder/modules/rdf-export-2
#
datasetId = cereals-dummy
datasetAccession = KnetMiner:CerealsDummy
datasetTitle = Knetminer's knowledge graph about cereals
datasetDescription = \
Knetminer is a gene discovery platform, which allows for exploring knwoledge graphs computed \
from common plant biology data, such as ENSEMBL, UniProt, TAIR, PUBMED and more.\n\
The Cereals/Poaceae dataset contains information about the several gramineae species, which is also linked \
to embedded data about the Arabidopsis model organism. It includes Arabidopsis, rice, and wheat.\n\
This is the dummy edition, which is a tiny and random abstract from the production dataset,\
used for tests and development.
datasetVersion = 1
# TODO: better URLs
datasetURL = https://knetminer.com/about
datasetNeo4jBrowserURL = <http://knetminer-wheat.cyverseuk.org:7474>
datasetNeo4jBOLTURL = <bolt://knetminer-neo4j.cyverseuk.org:7687>
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# These files are passed to the Ondex Metadata descriptor tool.
# https://github.com/Rothamsted/knetbuilder/tree/master/ondex-knet-builder/modules/rdf-export-2
# 
# https://github.com/Rothamsted/knetbuilder/tree/master/ondex-knet-builder/modules/rdf-export-2
#
datasetId = poaceae-free
datasetAccession = KnetMiner:PoaceaeFree
datasetTitle = Knetminer's knowledge graph about cereals
Expand Down
7 changes: 7 additions & 0 deletions rres-endpoints/config/default-cfg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ export KNET_INIT_DATASET_ID="$KETL_DATASET_ID"
export KETL_HAS_NEO4J=false

# The Neo4j server home. This uses their own naming convention.
# This is the server that is used to populate an empty DB with the current dataset and then
# to produce the Neo4j dump about the dataset.
#
# This IS NOT any production or test database (not until we change these scripts).
# If you're using SLURM, the start/stopping scripts will use this either, to launch
# the server on a SLURM node.
#
# export NEO4J_HOME=''

# You might need special, environment-dependent scripts to start/stop Neo
Expand Down
22 changes: 22 additions & 0 deletions rres-endpoints/config/environments/rres-conda-init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Initialises Conda/Snakemake for the RRes environment
#
# ===> This NEEDS TO BE RUN MANUALLY BEFORE scripts that use Snakemake
#

set -e

# This is what conda installation puts in .profile
__conda_setup="$('/home/data/knetminer/software/conda/mamba/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/data/knetminer/software/conda/mamba/etc/profile.d/conda.sh" ]; then
. "/home/data/knetminer/software/conda/mamba/etc/profile.d/conda.sh"
else
export PATH="/home/data/knetminer/software/conda/mamba/bin:$PATH"
fi
fi
unset __conda_setup
# End of conda installation snippet

conda activate snakemake
12 changes: 12 additions & 0 deletions rres-endpoints/config/environments/rres-env.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
###### General

if ! which snakemake >/dev/null; then
cat <<EOT
WARNING: snakemake not found in PATH. If you're missing conda initialisation, run this:
BEFORE running Snakemake-based scripts, else they will FAIL
EOT

fi


# The KnetMiner team has all of its stuff here
export KNET_HOME=/home/data/knetminer
# Where we keep software executables
Expand Down
131 changes: 131 additions & 0 deletions rres-endpoints/doc/example-log/2024-03-12T094938.570417.snakemake.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 16
Job stats:
job count min threads max threads
---------------- ------- ------------- -------------
add_uris 1 1 1
all 1 1 1
dataset_metadata 1 1 1
neo_dump 1 1 1
neo_export 1 1 1
neo_index 1 1 1
rdf_export 1 1 1
tdb_load 1 1 1
tdb_zip 1 1 1
total 9 1 1

Select jobs to execute...

[Tue Mar 12 09:50:03 2024]
rule add_uris:
input: /home/data/knetminer/etl-test/poaceae/57/generic/knowledge-network-free.oxl
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/knowledge-graph-uris.oxl
jobid: 2
resources: tmpdir=/tmp

Submitted job 2 with external jobid 'Submitted batch job 324984'.
[Tue Mar 12 10:03:43 2024]
Finished job 2.
1 of 9 steps (11%) done
Select jobs to execute...

[Tue Mar 12 10:03:43 2024]
rule dataset_metadata:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/knowledge-graph-uris.oxl
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/knowledge-graph-annotated.oxl, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/knowledge-graph-metadata.ttl
jobid: 1
resources: tmpdir=/tmp

Submitted job 1 with external jobid 'Submitted batch job 324987'.

[Tue Mar 12 10:03:43 2024]
rule rdf_export:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/knowledge-graph-uris.oxl
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/knowledge-graph.ttl.bz2
jobid: 3
resources: tmpdir=/tmp

Submitted job 3 with external jobid 'Submitted batch job 324988'.
[Tue Mar 12 10:17:34 2024]
Finished job 1.
2 of 9 steps (22%) done
[Tue Mar 12 10:26:25 2024]
Finished job 3.
3 of 9 steps (33%) done
Select jobs to execute...

[Tue Mar 12 10:26:26 2024]
rule tdb_load:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/knowledge-graph.ttl.bz2, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/knowledge-graph-metadata.ttl
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/tdb, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/ontologies
jobid: 4
resources: tmpdir=/tmp

Submitted job 4 with external jobid 'Submitted batch job 324990'.
[Tue Mar 12 10:56:18 2024]
Finished job 4.
4 of 9 steps (44%) done
Select jobs to execute...

[Tue Mar 12 10:56:19 2024]
rule neo_export:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/tdb
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/neo-export.flag
jobid: 8
resources: tmpdir=/tmp

Submitted job 8 with external jobid 'Submitted batch job 324993'.

[Tue Mar 12 10:56:19 2024]
rule tdb_zip:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/tdb
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/tdb.tar.bz2
jobid: 5
resources: tmpdir=/tmp

Submitted job 5 with external jobid 'Submitted batch job 324994'.
[Tue Mar 12 11:14:11 2024]
Finished job 5.
5 of 9 steps (56%) done
[Tue Mar 12 11:19:11 2024]
Finished job 8.
6 of 9 steps (67%) done
Select jobs to execute...

[Tue Mar 12 11:19:12 2024]
rule neo_index:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/neo-export.flag, /home/data/knetminer/pub/endpoints/poaceae-free/57/knowledge-graph-annotated.oxl
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/neo-index.flag
jobid: 7
resources: tmpdir=/tmp

Submitted job 7 with external jobid 'Submitted batch job 324997'.
[Tue Mar 12 11:40:24 2024]
Finished job 7.
7 of 9 steps (78%) done
Select jobs to execute...

[Tue Mar 12 11:40:24 2024]
rule neo_dump:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/neo-index.flag
output: /home/data/knetminer/pub/endpoints/poaceae-free/57/neo4j-5.16.0.dump
jobid: 6
resources: tmpdir=/tmp

Submitted job 6 with external jobid 'Submitted batch job 324999'.
[Tue Mar 12 11:52:35 2024]
Finished job 6.
8 of 9 steps (89%) done
Select jobs to execute...

[Tue Mar 12 11:52:35 2024]
localrule all:
input: /home/data/knetminer/pub/endpoints/poaceae-free/57/knowledge-graph-annotated.oxl, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/knowledge-graph.ttl.bz2, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/ontologies, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/knowledge-graph-metadata.ttl, /home/data/knetminer/pub/endpoints/poaceae-free/57/tmp/tdb, /home/data/knetminer/pub/endpoints/poaceae-free/57/rdf/tdb.tar.bz2, /home/data/knetminer/pub/endpoints/poaceae-free/57/neo4j-5.16.0.dump
jobid: 0
resources: tmpdir=/tmp

[Tue Mar 12 11:52:35 2024]
Finished job 0.
9 of 9 steps (100%) done
Complete log: /home/data/knetminer/software/knetminer-backend/rres-endpoints/.snakemake/log/2024-03-12T094938.570417.snakemake.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Building DAG of jobs...
MissingInputException in line 36 of /home/data/knetminer/software/knetminer-backend/rres-endpoints/build-endpoint.snakefile:
Missing input files for rule add_uris:
/home/data/knetminer/etl-test/poaceae/1/generic/knowledge-network-free.oxl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Building DAG of jobs...
MissingInputException in line 36 of /home/data/knetminer/software/knetminer-backend/rres-endpoints/build-endpoint.snakefile:
Missing input files for rule add_uris:
/home/data/knetminer/etl-test/cereals-dummy/1/generic/knowledge-network-free.oxl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Building DAG of jobs...
MissingInputException in line 36 of /home/data/knetminer/software/knetminer-backend/rres-endpoints/build-endpoint.snakefile:
Missing input files for rule add_uris:
/home/data/knetminer/etl-test/cereals-dummy/cereals-dummy-1.oxl/generic/knowledge-network-free.oxl
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 16
Job stats:
job count min threads max threads
---------------- ------- ------------- -------------
add_uris 1 1 1
all 1 1 1
dataset_metadata 1 1 1
neo_dump 1 1 1
neo_export 1 1 1
neo_index 1 1 1
rdf_export 1 1 1
tdb_load 1 1 1
tdb_zip 1 1 1
total 9 1 1

Select jobs to execute...

[Wed Apr 24 17:27:36 2024]
rule add_uris:
input: /home/data/knetminer/etl-test/cereals-dummy/cereals-dummy-1.oxl
output: /home/data/knetminer/pub/endpoints/cereals-dummy/1/tmp/knowledge-graph-uris.oxl
jobid: 2
resources: tmpdir=/tmp

Submitted job 2 with external jobid 'Submitted batch job 328727'.
[Wed Apr 24 17:28:14 2024]
Finished job 2.
1 of 9 steps (11%) done
Select jobs to execute...

[Wed Apr 24 17:28:14 2024]
rule dataset_metadata:
input: /home/data/knetminer/pub/endpoints/cereals-dummy/1/tmp/knowledge-graph-uris.oxl
output: /home/data/knetminer/pub/endpoints/cereals-dummy/1/knowledge-graph-annotated.oxl, /home/data/knetminer/pub/endpoints/cereals-dummy/1/rdf/knowledge-graph-metadata.ttl
jobid: 1
resources: tmpdir=/tmp

Submitted job 1 with external jobid 'Submitted batch job 328728'.

[Wed Apr 24 17:28:15 2024]
rule rdf_export:
input: /home/data/knetminer/pub/endpoints/cereals-dummy/1/tmp/knowledge-graph-uris.oxl
output: /home/data/knetminer/pub/endpoints/cereals-dummy/1/rdf/knowledge-graph.ttl.bz2
jobid: 3
resources: tmpdir=/tmp

Submitted job 3 with external jobid 'Submitted batch job 328729'.
[Wed Apr 24 17:28:54 2024]
Error in rule dataset_metadata:
jobid: 1
output: /home/data/knetminer/pub/endpoints/cereals-dummy/1/knowledge-graph-annotated.oxl, /home/data/knetminer/pub/endpoints/cereals-dummy/1/rdf/knowledge-graph-metadata.ttl
shell:
./endpoint-steps/create-dataset-metadata.sh "/home/data/knetminer/pub/endpoints/cereals-dummy/1/tmp/knowledge-graph-uris.oxl" "/home/data/knetminer/pub/endpoints/cereals-dummy/1/knowledge-graph-annotated.oxl" "/home/data/knetminer/pub/endpoints/cereals-dummy/1/rdf/knowledge-graph-metadata.ttl"
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Submitted batch job 328728

Error executing rule dataset_metadata on cluster (jobid: 1, external: Submitted batch job 328728, jobscript: /home/data/knetminer/software/knetminer-backend/rres-endpoints/.snakemake/tmp.pg985byu/snakejob.dataset_metadata.1.sh). For error details see the cluster log and the log files of the involved rule(s).
[Wed Apr 24 17:29:04 2024]
Finished job 3.
2 of 9 steps (22%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/data/knetminer/software/knetminer-backend/rres-endpoints/.snakemake/log/2024-04-24T172725.097077.snakemake.log
Loading

0 comments on commit a428929

Please sign in to comment.