Datalad-Registry is now live at registry.datalad.org.
We fully test Datalad-Registry with Podman. Nevertheless, you should be able to launch a Datalad-Registry instance using Docker with little to no deviation from this guide.
-
The following dependencies are needed in a system in order to test and develop Datalad-Registry:
-
We strongly recommend installing
podman-compose
and other Python dependencies in a Python virtual environment for this project.
-
Datalad-Registry's package version is determined by versioningit using Git tags in the repo the package is built from. Please ensure your clone of the original Datalad-Registry repo includes all the tags for a successful build. (Direct clones of the original likely have all those tags. For clones of a fork of the original, you may need to add the original as a remote and fetch all the tags from it.)
-
Setup
-
On Debian systems, install the necessary dependencies for Python PostgreSQL libs:
sudo apt-get install libpq-dev python3-dev
-
Install Datalad-Registry for testing in the Python virtual environment for this project (as mentioned in the Prerequisites section):
pip install -e .[test]
-
Launch the needed components of DataLad-Registry for testing from a subshell with needed environment variables loaded from
env.test
. (Note: using a subshell avoids polluting the current shell with the environment variables fromenv.test
):(set -a && . ./env.test && set +a && podman-compose -f docker-compose.test.yml up -d)
-
-
Test execution
-
Launch the tests from a subshell with the needed environment variables loaded from
env.test
:(set -a && . ./env.test && set +a && python -m pytest -s -v)
-
-
Teardown
When the testing is done, you can bring down the components of Datalad-Registry launched.
-
Bring down the components of Datalad-Registry launched from a subshell with the needed environment variables loaded from
env.test
:(set -a && . ./env.test && set +a && podman-compose -f docker-compose.test.yml down)
-
-
Setup
-
On Debian systems, install the necessary dependencies for Python PostgreSQL libs:
sudo apt-get install libpq-dev python3-dev
-
Install Datalad-Registry for development in the Python virtual environment for this project (as mentioned in the Prerequisites section):
pip install -e .[dev]
-
Set values for needed environment variables by creating a
.env.dev
filetemplate.env
is a template for creating the.env.dev
file. It lists all the needed environment variables with defaults. We will use it to create the.env.dev
file.-
Create the
.env.dev
file by copying thetemplate.env
file to.env.dev
:cp template.env .env.dev
-
Modify the
.env.dev
file according to your needs by adjusting the values for usernames, passwords, etc.
note: we git ignore all
.env
files. -
-
Launch the needed components of DataLad-Registry for development from a subshell with needed environment variables loaded from
.env.dev
. (Note: using a subshell avoids polluting the current shell with the environment variables from.env.dev
):(set -a && . ./.env.dev && set +a && podman-compose -f docker-compose.yml -f docker-compose.dev.override.yml up -d --build)
-
-
Development
At this point, the proper development environment is set up. However, please note the following characteristics of this development environment:
- The current directory at the host is bind-mounted to the
/app
directory within the web service container. - The subdirectory
./instance
at the host is bind-mounted to the/app/instance
directory within the web service container to serve as the instance folder for the Flask application run by the web service container. - The web service container runs the Flask application in debug mode and reacts to changes in the current directory, the codebase, at the host machine.
- All other component services of Datalad-Registry, as defined in
docker-compose.yml
, do not react to changes in the codebase at the host machine. (Note: This behavior is the result of a design choice. The worker service, a Celery worker, for example, should not react to changes in the codebase at the host machine for the tasks it executes may not always be idempotent.)- To realize the changes in the codebase at the host machine in other component services, you needed to bring down all the components of Datalad-Registry as specified in the following Teardown section and relaunch them according to the above Setup section.
- The current directory at the host is bind-mounted to the
-
Teardown
When done with developing, you can bring down the components of Datalad-Registry launched.
-
Bring down the components of Datalad-Registry launched from a subshell with the needed environment variables loaded from
.env.dev
:(set -a && . ./.env.dev && set +a && podman-compose -f docker-compose.yml -f docker-compose.dev.override.yml down)
-
Datalad-Registry can operate in a read-only mode. In this mode, Datalad-Registry
consists of two services, a web service that accepts only read-only requests and
a read-only database service, as define in docker-compose.read-only.yml
.
The read-only database service is a replica of the database service in an instance of
Datalad-Registry that allows both reads and writes,
operating in PRODUCTION
or DEVELOPMENT
mode.
To set up Datalad-Registry to run in read-only mode involves the following steps:
-
Configure the database service of an existing instance of Datalad-Registry that allows both reads and writes, operating in
PRODUCTION
orDEVELOPMENT
mode, to be the primary database service that the database service in the read-only instance of Datalad-Registry will replicate from.- Create a role in the primary database service for replication by executing
the following SQL command via
psql
or any other PostgreSQL client.whereCREATE ROLE <replica_user> WITH REPLICATION LOGIN ENCRYPTED PASSWORD '<password>';
<replica_user>
is the name of the role and<password>
is the password for the role to access the primary database service. - Modify the
postgresql.conf
configuration file of the primary database service.- Enable the
wal_level
configuration parameter and set its value toreplica
. - Enable the
wal_log_hints
configuration parameter and set its value toon
. - Enable the
wal_keep_size
configuration parameter and set its value to1024
.
- Enable the
- Modify the
pg_hba.conf
configuration file of the primary database service.- Add the following line to the end of the file.
where
host replication <replica_user> <replica_source_ip>/32 md5
<replica_user>
is the name of the role created two steps before, and<replica_source_ip>
is the IP address of the replica database service in the read-only instance of Datalad-Registry. (Note:<replica_source_ip>/32
as a whole specifies a range of IP addresses that a replica database service can connect from.)
- Add the following line to the end of the file.
The location of the
postgresql.conf
andpg_hba.conf
configuration files depends on the individual setup of PostgreSQL. All PostgreSQL setups defined in all the Docker Compose files in this project store thepostgresql.conf
andpg_hba.conf
configuration files in/var/lib/postgresql/data
. The application of any changes in thepostgresql.conf
file requires a restart of the PostgreSQL service. The application of any changes in thepg_hba.conf
file can be accomplished by either restarting the PostgreSQL service or executing the following SQL command viapsql
or any other PostgreSQL client.SELECT pg_reload_conf();
- Create a role in the primary database service for replication by executing
the following SQL command via
-
Set up the database service of the read-only instance of Datalad-Registry to be a read-only replica of the primary database service.
- Take a base backup of the primary database service at a node that is to be served
as the read-only replica.
- Start this node with an empty data directory.
- Uncomment the line
command: ["tail", "-f", "/dev/null"]
indocker-compose.read-only.yml
. (This allows starting of the node without starting the PostgreSQL service and populating the data directory.) - Start the node by executing the following command.
where
(set -a && . ./.env.read-only && set +a && podman-compose -f docker-compose.read-only.yml up read-only-db -d)
.env.read-only
is a file containing the needed environment variables. You can use thetemplate.env.read-only
file as a template to construct this file.
- Uncomment the line
- Run the base backup command inside the node.
- Get into the BASH shell of the node by running the following command.
where
podman exec -it <name_of_the_node> /bin/bash
<name_of_the_node>
is the name of the container that is the node. - Run the following command
pg_basebackup
.wherepg_basebackup -h <primary_ip> -p <port_number> -U <replica_user> -X stream -C -S replica_1 -v -R -W -D /var/lib/postgresql/data
<primary_ip>
is the IP address of the primary database service,<port_number>
is the port number of the primary database service, and<replica_user>
is the name of the role created in the primary database in step 1. - Once the backup is complete, exit the bash shell of the node by running
the following command.
exit
- Get into the BASH shell of the node by running the following command.
- Stop the node by running the following command.
(set -a && . ./.env.read-only && set +a && podman-compose -f docker-compose.read-only.yml down)
- Restore the
docker-compose.read-only.yml
file to its original state by commenting out the linecommand: ["tail", "-f", "/dev/null"]
.
- Start this node with an empty data directory.
- Take a base backup of the primary database service at a node that is to be served
as the read-only replica.
After going through the above steps, the initial setup for running Datalad-Registry in read-only mode is complete. To start the read-only instance of Datalad-Registry, just run the following command.
(set -a && . ./.env.read-only && set +a && podman-compose -f docker-compose.read-only.yml up -d --build)