Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Provide instructions for federation API #133

Merged
merged 19 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions docs/federate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
## When to use local query federation
There are two main reasons to deploy local query federation:
surchs marked this conversation as resolved.
Show resolved Hide resolved

- **Case 1**: one-way federation. You have (at least) one [local neurobagel
node](infrastructure.md) and you want your users to be able to search
the data in the local node alongside all the publicly
visible data in the neurobagel network.
- **Case 2**: internal federation. You have two or more local neurobagel
nodes (e.g. for data from different groups in your institute)
and you want your local users to search across all of them.

![Local federation scenarios](imgs/local_federation_architecture.jpg)

Note that these cases are not mutually exclusive.
Any local neurobagel nodes you deploy will only be visible to users
inside of your local network (internal federation).

## When not to use local query federation
Query federation is not necessary, if you:

- **only want to query public neurobagel nodes**:
Existing public nodes in the neurobagel network are accessible
to everyone via our public query tool (e.g. on [query.neurobagel.org](https://query.neurobagel.org/)),
meaning you can run federated queries over these graph databases without any additional local setup.
- **you only want to search a single neurobagel node**:
If you only have one local node that you want to query,
it is easier to directly query the node-API of this node.
In that case, all you have to do is follow the [deployment instructions
for a neurobagel node](infrastructure.md) and you are good to go.

## Setting up for local federation
Federated graph queries in neurobagel are provided by the federation API (`f-API`) service.
The neurobagel `f-API` takes a single user query and then sends it to every
neurobagel node API (`n-API`) it is aware of, collects and combinesthe responses,
surchs marked this conversation as resolved.
Show resolved Hide resolved
and sends them back to the user as a single answer.

!!! note

Make sure you have at least one [local `n-API` configured and running](infrastructure.md)
before you set up local federation. If you do not have any local
`n-APIs` to federate over, you can just use our public query tool directly at [query.neurobagel.org](https://query.neurobagel.org/).

In your command line, create and navigate to a new directory where you will keep the configuration
files for your new `f-API`. In this directory, create two files:

### `fed.env` environment file

Create a text file called `fed.env` to hold environment variables needed for the `f-API` deployment.
Let's assume there are two local nodes already running on different servers of your institutional network, and you want to set up federation across both nodes:

- a node named `"node_archive"` running on your local computer on port `8000` and
- a node named `"node_recruitment"` running on a different computer with the local IP `192.168.0.1`, listening on the default http port `80`.
alyssadai marked this conversation as resolved.
Show resolved Hide resolved
In your `fed.env` file you would configure this as follows:
surchs marked this conversation as resolved.
Show resolved Hide resolved

``` {.bash .annotate title="docker-compose.yml"}
# Configuration for f-API
# List of known local node APIs: (node_URL, node_NAME)
LOCAL_NB_NODES=(http://localhost:8000, node_archive) (http://192.168.0.1, node_recruitment)
# Define the port that the f-API will run on INSIDE the docker container (default 8000)
NB_API_PORT=8000
# Define the port that the f-API will be exposed on to the host computer (and likely the outside network)
NB_API_PORT_HOST=8080
# Chose the docker image tag of the f-API (default latest)
NB_API_TAG=latest

# Configuration for query tool
# Define the URL of the f-API as it will appear to a user
API_QUERY_URL=http://localhost:8080 # (1)!
# Chose the docker image tag of the query tool (default latest)
NB_QUERY_TAG=latest
surchs marked this conversation as resolved.
Show resolved Hide resolved
# Chose the port that the query tool will be exposed on the host and likely the network (default 3000)
NB_QUERY_PORT_HOST=3000
```

1. When a user users the graphical query tool to query your
f-API, these requests will be sent from the users machine,
not from the machine hosting the query tool.

Make sure you set the `API_QUERY_URL` in your `fed.env`
as it will appear to a user on their own machine
- otherwise the request will fail..

surchs marked this conversation as resolved.
Show resolved Hide resolved
Each node to be federated over is described in the variable `LOCAL_NB_NODES` by a comma-delimited tuple of the form `(node_URL, node_NAME)`.

You can add one or more local nodes to the list of nodes known to your `f-API` in this way.
Just adjust the above code snippet according to your own deployment, and store it in a file called `fed.env`.


### `docker-compose.yml` docker config file

Create a second file called `docker-compose.yml`.
This file describes the required services, ports and paths
to launch the `f-API` together with a connected query tool.

!!! danger "Make sure you have a recent version of docker compose installed"
surchs marked this conversation as resolved.
Show resolved Hide resolved

Some Linux distributions come with outdated versions of `docker` and
`docker compose` installed. Please make sure you install `docker`
as described in the [official documentation](https://docs.docker.com/engine/install/).

Copy the following snippet into your `docker-compose.yml` file.
surchs marked this conversation as resolved.
Show resolved Hide resolved
You should not have to change anything about this file.
All local configuration changes are done in the `fed.env` file.

``` {.yaml .annotate title="docker-compose.yml"}
version: "3.8"

services:
federation:
image: "neurobagel/federation_api:${NB_API_TAG:-latest}"
ports:
- "${NB_API_PORT_HOST:-8000}:${NB_API_PORT:-8000}"
surchs marked this conversation as resolved.
Show resolved Hide resolved

environment:
- LOCAL_NB_NODES=${LOCAL_NB_NODES} # (1)!
- NB_API_PORT=${NB_API_PORT:-8000}
query:
image: "neurobagel/query_tool:${NB_QUERY_TAG:-latest}"
ports:
- "${NB_QUERY_PORT_HOST:-3000}:3000"
surchs marked this conversation as resolved.
Show resolved Hide resolved
environment:
- API_QUERY_URL=${API_QUERY_URL:-http://localhost:8000/}
```

1. We maintain a list of public neurobagel nodes
[here](https://github.com/neurobagel/menu/blob/main/node_directory/neurobagel_public_nodes.json).
By default every new `f-API` will lookup this list
on startup and include it in the list of nodes to
federate over.
This also means that you do not have to manually
configure public nodes, i.e. you **do not have to explicitly add them** to the `LOCAL_NB_NODES` variable) in your `fed.env` file.


## Launch f-API and query tool
Once you have created your `fed.env` and `docker-compose.yml` files
as described above, you can simply launch the services by running

`docker compose --env-file fed.env up -d`

from the same directory.
Binary file added docs/imgs/local_federation_architecture.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
132 changes: 93 additions & 39 deletions docs/infrastructure.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# SysAdmin
These instructions are for a sysadmin looking to
deploy a new Neurobagel node locally in an institute or lab.
surchs marked this conversation as resolved.
Show resolved Hide resolved
A local **neurobagel node** includes the **neurobagel API** and
a **graph backend** to store the harmonized metadata.

## Introduction
These instructions are for a sysadmin looking to deploy Neurobagel locally in an institute or lab.
A local neurobagel deployment includes the neurobagel API,
a graph backend to store the harmonized metadata,
and optionally a locally hosted graphical query interface.
To make searching the neurobagel node easier,
you can optionally also set up
a **[locally hosted graphical query interface](#deploy-a-graphical-query-tool).**

![The neurobagel API and graph backend](imgs/nb_architecture.jpg)

Expand Down Expand Up @@ -119,7 +120,7 @@ Below are all the possible Neurobagel environment variables that can be set in `

_** `NB_GRAPH_ADDRESS` should not be changed from its default value (`graph`) when using docker compose as this corresponds to the preset container name of the graph database server within the docker compose network._

_‡ See section [Using a graphical query tool to send API requests](#a-note-on-using-a-graphical-query-tool-to-send-api-requests)_
_‡ See section [Deploy a graphical query tool](#deploy-a-graphical-query-tool)_


For a local deployment, we recommend to **explicitly set** at least the following variables in `.env`
Expand All @@ -142,35 +143,6 @@ For a local deployment, we recommend to **explicitly set** at least the followin

For more information, see [Docker's environment variable precedence](https://docs.docker.com/compose/environment-variables/envvars-precedence/).

### A note on using a graphical query tool to send API requests
surchs marked this conversation as resolved.
Show resolved Hide resolved
The `NB_API_ALLOWED_ORIGINS` variable defaults to an empty string (`""`) when unset, meaning that your deployed API will only be accessible via direct `curl` requests to the URL where the API is hosted (see [this section](#test-the-new-deployment) for an example `curl` request).

However, in many cases you may want to make the API accessible by a frontend tool such as our [browser query tool](https://github.com/neurobagel/query-tool).
To do so, you must explicitly specify the origin(s) for the frontend using `NB_API_ALLOWED_ORIGINS` in `.env`.
For detailed instructions regarding the query tool see [Running cohort queries](query_tool.md).

For example, the [`.template-env`](https://github.com/neurobagel/api/blob/main/.template-env) file in the Neurobagel API repo assumes you want to allow API requests from a query tool hosted at a specific port on `localhost` (see the [Docker Compose section](#docker-compose)).

??? example "More examples of `NB_API_ALLOWED_ORIGINS`"
``` bash title=".env"
# do not allow requests from any frontend origins
NB_API_ALLOWED_ORIGINS="" # this is the default value that will also be set if the variable is excluded from the .env file

# allow requests from only one origin
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org"

# allow requests from 3 different origins
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org https://localhost:3000 http://localhost:3000"

# allow requests from any origin - use with caution
NB_API_ALLOWED_ORIGINS="*"
```

??? note "For more technical deployments using NGINX"

If you have configured an NGINX reverse proxy (or proxy requests to the remote origin) to serve both the API and the query tool from the same origin, you can skip the step of enabling CORS for the API.
For an example, see https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/.

### Docker Compose

To spin up the API and graph backend containers using Docker Compose,
Expand All @@ -189,9 +161,6 @@ Or, if you want to ensure you always pull the latest Docker images first:
docker compose pull && docker compose up -d
```

By default, this will also deploy a local version of the [Neurobagel graphical query tool](https://github.com/neurobagel/query-tool).
If using the default port mappings, you can reach your local query tool at [http://localhost:3000](http://localhost:3000) once it is running.

## Setup for the first run

When you launch the graph backend for the first time,
Expand Down Expand Up @@ -611,3 +580,88 @@ and click "Try it out" and then "Execute" to execute a query.
!!! note
For very large databases, requests to the API using the interactive docs UI may be very slow or time out.
If this prevents test queries from succeeding, try setting more parameters to enable an example response from the graph, or use a `curl` request instead.


## Deploy a graphical query tool
To give your users an easy, graphical way to
query your new local neurobagel node,
you have two options:

### As part of local federation
Use this option if any of the following apply! You:

- already have deployed other local neurobagel nodes
that you want your users to query alongside the new node
- want your users to be able to query
all public neurobagel nodes together with your new node
surchs marked this conversation as resolved.
Show resolved Hide resolved
- plan on adding more local neurobagel nodes in the
near future that you will want to query alongside your newly created node

In this case, skip directly to the page on
setting up [local query federation](federate.md).

### As a standalone service
Use this option if you

- plan on only deploying a single node
- want your users to only search data
in the new node you deployed

In this case, you need to deploy the query tool
as a standalone docker container.


```bash
docker run -d -p 3000:3000 --env API_QUERY_URL=http://localhost:8000/ --name query_tool neurobagel/query_tool:latest
surchs marked this conversation as resolved.
Show resolved Hide resolved
```

??? todo

Update docker example to use a specific version
once https://github.com/neurobagel/planning/issues/64
is closed.

Make sure to replace the value of `API_QUERY_URL` with the `IP:PORT` or domain name of the
new neurobagel node-API you just deployed!
surchs marked this conversation as resolved.
Show resolved Hide resolved

If using the default port mappings for the query tool (`-p 3000:3000` in above command),
you can reach your local query tool at [http://localhost:3000](http://localhost:3000) once it is running.

To verify the exact configuration that your new docker
container is running with (e.g. for debugging),
you can run

```bash
docker inspect query_tool
```

### Updating your API configuration
If deploying the query tool as a standalone service for the local node you have just created, you must ensure the `NB_API_ALLOWED_ORIGINS` variable is correctly set in the [`.env` file configuration for your node API](#set-the-environment-variables).
The `NB_API_ALLOWED_ORIGINS` variable defaults to an empty string (`""`) when unset, meaning that your deployed API will only be accessible via direct `curl` requests to the URL where the API is hosted (see [this section](#test-the-new-deployment) for an example `curl` request).

To make the API accessible by a frontend tool such as our [browser query tool](https://github.com/neurobagel/query-tool),
you must explicitly specify the origin(s) for the frontend using `NB_API_ALLOWED_ORIGINS` in `.env`.
For detailed instructions regarding the query tool see [Running cohort queries](query_tool.md).

For example, the [`.template-env`](https://github.com/neurobagel/api/blob/main/.template-env) file in the Neurobagel API repo assumes you want to allow API requests from a query tool hosted at a specific port on `localhost` (see the [Docker Compose section](#docker-compose)).

!!! example "More examples of `NB_API_ALLOWED_ORIGINS`"

``` bash title=".env"
# do not allow requests from any frontend origins
NB_API_ALLOWED_ORIGINS="" # this is the default value that will also be set if the variable is excluded from the .env file

# allow requests from only one origin
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org"

# allow requests from 3 different origins
NB_API_ALLOWED_ORIGINS="https://query.neurobagel.org https://localhost:3000 http://localhost:3000"

# allow requests from any origin - use with caution
NB_API_ALLOWED_ORIGINS="*"
```
surchs marked this conversation as resolved.
Show resolved Hide resolved

??? note "For more technical deployments using NGINX"

If you have configured an NGINX reverse proxy (or proxy requests to the remote origin) to serve both the API and the query tool from the same origin, you can skip the step of enabling CORS for the API.
For an example, see https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/.
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ nav:
- Preparing data for annotation: "data_prep.md"
- Annotating a dataset: "annotation_tool.md"
- Generating harmonized subject-level metadata: "cli.md"
- Setting up a graph: "infrastructure.md"
- Set up a neurobagel node: "infrastructure.md"
- Set up local federation: "federate.md"
- Updating a harmonized dataset: "updating_dataset.md"
- Using the API: "api.md"
- Running cohort queries: "query_tool.md"
Expand Down