Configuring a node

Neurobagel is designed to be easily deployed with a single command without deep configuration. In many cases however, you will want to customize your deployment to fit your needs.

If you already have a running Neurobagel node, after making any configuration changes (including changing the data you want to be available in the graph database), follow the instructions to restart your services for the changes to take effect.

Deployment

Available services

The Neurobagel Docker Compose recipe includes several services and coordinates them to work together:

(In parentheses are the names of services within the Docker Compose stack)

Neurobagel node API/n-API (api): The API that communicates with a single graph store and determines how detailed the response to a query should be from that graph.
Graph store (graph): A third-party RDF store that stores Neurobagel-harmonized data to be queried. At the moment our recipe uses the free tier of GraphDB for this.
Neurobagel federation/f-API (federation): A special API that can federate over one or more Neurobagel nodes to provide a single point of access to multiple distributed databases. By default it will federate over all public nodes and any local nodes you specify.
Neurobagel query tool (query_federation): A web app that provides a graphical interface for users to query a federation API and view the results from one or more nodes. Because the query tool is a static app and is run locally in the user's browser, this service simply hosts the app.

Available profiles

Neurobagel offers different deployment profiles that allow you to spin up specific combinations of services (listed below), depending on your use case.

full_stack: Best profile to get started with Neurobagel. It includes all services you need to run a local Neurobagel node and have the ability to query public nodes, along with a graphical query tool.
- api
- graph
- federation
- query_tool
Info

This is the default profile if you don't specify one.

By default, this profile will also federate over all publicly accessible Neurobagel nodes, although this behaviour can be disabled in the f-API using the environment variable NB_FEDERATE_REMOTE_PUBLIC_NODES.
local_node: Best profile if you want to run a standalone Neurobagel node but rely on a separate deployment for providing federation and a graphical query tool (such as Neurobagel's own hosted public instances).
- api
- graph
local_federation: Best profile if you already have multiple standalone (local or non-publicly-accessible) Neurobagel node deployments running and you now want to provide federation over them.
- federation
- query_tool
Info

If you only want to federate over a single local node and all public Neurobagel nodes, we recommend using the full_stack profile to set up your node and federation in one step. If you choose to use the local_federation profile, you will have to manually configure your local_nb_nodes.json file.

Launching a profile

You can then launch a specific profile using the --profile or -p flag with docker compose, e.g.:

docker compose --profile full_stack up -d

If no profile is specified, docker compose up -d will start the services for the default profile, full_stack.

Take a look at the getting started guide for more information setting up for a first launch.

Environment variables

Below are all the possible Neurobagel environment variables that can be set in .env.

Environment variable	Default needs change?	Description	Default value if not set	Used in these installation modes
`NB_GRAPH_USERNAME`	Yes	Username to set for the graph database user.	-	Docker, Python
`NB_GRAPH_SECRETS_PATH`	Yes	Path to files containing the secure passwords to set for the admin user (NB_GRAPH_ADMIN_PASSWORD.txt) and graph database user (NB_GRAPH_PASSWORD.txt).	`./secrets`	Docker
`NB_GRAPH_DB`	Yes	Name to give your graph database (e.g., for a GraphDB database, use the format `repositories/{database_name}`)	`repositories/my_db`	Docker, Python
`NB_GRAPH_MEMORY`	No	The maximum amount of memory that can be used by graph. Equivalent to setting the `-Xmx` parameter on the JVM. Value should be a number followed directly by a letter denoting the size. E.g. `264m` for 264 MB, `2g` for 2 GB. (For more info, see https://graphdb.ontotext.com/documentation/10.8/requirements.html#hardware-sizing.)	`2g`	Docker
`LOCAL_GRAPH_DATA`	Yes	Path on your filesystem to the JSONLD files you want to upload to the graph database	`./data`	Docker
`NB_GRAPH_PORT_HOST`	No	Port number on the host machine to map the graph server container port to	`7200`	Docker
`NB_NAPI_ALLOWED_ORIGINS`	No	Origins allowed to make cross-origin resource sharing requests. Multiple origins must be separated with spaces in a single string enclosed in quotes.	`""`	Docker, Python
`NB_RETURN_AGG`	No	Whether to return only aggregate, dataset-level query results (excluding subject/session-level attributes). One of [true, false]	`true`	Docker, Python
`NB_MIN_CELL_SIZE`	No	Minimum number of matching subjects required for a dataset to be returned as a query match. Datasets with matching subjects <= this number will be excluded from query results.	`0`	Docker, Python
`NB_NAPI_TAG`	No	Docker image tag for the Neurobagel node API	`latest`	Docker
`NB_NAPI_PORT_HOST`	No	Port number on the host machine to map the Neurobagel node API container port to	`8000`	Docker
`NB_NAPI_BASE_PATH`	No	(If using reverse proxy) The URL path where the node API is served from. Do not include a trailing slash.	`""`	Docker
`NB_FAPI_TAG`	No	Docker image tag for the Neurobagel federation API	`latest`	Docker
`NB_FAPI_PORT_HOST`	No	Port number on the host machine to map the Neurobagel federation API container port to	`8080`	Docker
`NB_FEDERATE_REMOTE_PUBLIC_NODES`	No	If "True", include public nodes in federation. If "False", only locally specified nodes in `local_nb_nodes.json` are queried.	`true`	Docker, Python
`NB_FAPI_BASE_PATH`	No	(If using reverse proxy) The URL path where the federation API is served from. Do not include a trailing slash.	`""`	Docker
`NB_QUERY_TAG`	No	Docker image tag for the query tool	`latest`	Docker
`NB_QUERY_PORT_HOST`	No	Port number used by the `query_tool` on the host machine	`3000`	Docker
`NB_API_QUERY_URL`	Yes	URL (and port number, if needed) of the Neurobagel API that the query tool will send its requests to. The query tool sends requests from a user's machine, so ensure the API URL is provided as a user would access it from their own machine. See also the query tool README.	-	Docker
`NB_QUERY_APP_BASE_PATH`	No	(If using reverse proxy) The URL path for the query tool, determines the specific URL at which the app should be rendered for users to access it	`/`	Docker
`NB_QUERY_HEADER_SCRIPT`	No	(Experimental, for development environments only) Custom script to add to the header section of the query tool site, such as for a GDPR-aware analytics tool.	`""`	Docker
`NB_ENABLE_AUTH`	No	(Experimental, for development environments only) Whether to enable authentication for cohort queries. One of [true, false]	`false`	Docker, Python
`NB_QUERY_CLIENT_ID`	No	(Experimental, for development environments only) OAuth client ID for the query tool. Required if NB_ENABLE_AUTH is set to true.	-	Docker, Python

Ensure that shell variables do not clash with .env file

If the shell you run docker compose from already has any shell variable of the same name set, the shell variable will take precedence over the configuration of .env! In this case, make sure to unset the local variable first.

For more information, see Docker's environment variable precedence.

Tip

Double check that any environment variables you have customized in .env are resolved with your expected values using the command docker compose config.

Change security relevant variables

The graph store (GraphDB instance) in a Neurobagel node is secured with password-based access and includes two users: an admin superuser and a regular database user, both of which are automatically configured by the Neurobagel deployment recipe. Passwords for both users are defined via files in the ./secrets directory of the recipes repository, while the regular database username is set through an environment variable in .env file.

For security and best practice purposes, we recommend changing the following values from their defaults if you are using a deployment profile that includes a graph store:

In your .env, set a custom username and database name for your graph store by editing the following variables:
- NB_GRAPH_USERNAME
- NB_GRAPH_DB
In the (./secrets directory, change the default passwords by replacing the contents of the file NB_GRAPH_ADMIN_PASSWORD.txt for the admin superuser, and the file NB_GRAPH_PASSWORD.txt for the graph database user (corresponding to NB_GRAPH_USERNAME).
- To generate a random password in the terminal, you can use:
```
openssl rand -hex 16
```
- (Optional) You can change the directory where your password files are stored by editing the NB_GRAPH_SECRETS_PATH variable in .env.
Graph store passwords are not meant for node users!

The admin user and graph database user credentials are intended solely for internal use by the deployment recipe scripts that automatically set up and update the graph store, or for a node administrator to interact directly with the graph store. These credentials also secure internal communication between your graph store and its node API, ensuring that node users cannot query your graph directly. GraphDB user credentials are not intended for use by a general node query user.

Passwords are handled as Docker secrets

The contents of NB_GRAPH_ADMIN_PASSWORD.txt and NB_GRAPH_PASSWORD.txt are passed to Neurobagel containers as Docker secrets. This ensures that your passwords are not exposed in the container logs or in the docker-compose.yml file.

Do not share your password files with others.
Review and change as needed the following variables in .env based on your data sharing requirements:
- NB_RETURN_AGG
- NB_MIN_CELL_SIZE
Info

These variables are modifiable after node initialization; you can change their values at any time.
If you've previously launched a Neurobagel Docker Compose stack following the Getting started instructions, you'll need to reset your graph store for any changes you have made to user credentials to take effect (steps 1-2 above). Don't worry, any other configuration changes you've already made will be applied when you re-launch your node.

Configuring local node names and URLs for federation

When using a deployment profile that provides federation (i.e., includes the federation API), you can define the URLs and display names of the node APIs of any local nodes you wish to federate over in a file called local_nb_nodes.json. This file is used by the f-API.

Each node to be federated over is defined using a dictionary with two key-value pairs:

{
  "NodeName": "<display name for the node>",
  "ApiURL": "<URL of the node API exposed for that node>"
}

Values of NodeName are arbitrary. Multiple nodes must be wrapped in a list [].

Nodes that do not need to be manually configured

We maintain a list of publicly accessible Neurobagel nodes here. By default, every new f-API will look up this list on startup and include it in its internal list of nodes to federate over (this can be disabled using the environment variable NB_FEDERATE_REMOTE_PUBLIC_NODES). This also means that you do not have to explicitly add these public nodes to your local_nb_nodes.json file.

Example: Assume there are two local nodes already running on different servers of your institutional network, and you want to set up federation across both nodes:

a node named "My Institute" running on your local computer (localhost), on port 8000 and
a node named "Node Recruitment" running on a different computer with the local IP 192.168.0.1, listening on the default http port 80.

You would configure your local_nb_nodes.json as follows:

local_nb_nodes.json

[
  {
    "NodeName": "My Institute",
    "ApiURL": "https://neurobagel.myinstitute.edu",
  },
  {
    "NodeName": "Node Recruitment",
    "ApiURL": "http://192.168.0.1"
  }
]

Do not use localhost/127.0.0.1 in local_nb_nodes.json

Even if the local node API(s) you are federating over are running on the same host machine as your federation API, you cannot use localhost for the "ApiURL" and must instead provide a network-accessible URL, IP address, or container name. For an example, see the configuration for the node called "My Institute" above.

Ensure that you not accidentally provide the address of your actual federation API for "ApiURL"! This will cause an infinite request loop that will likely overload your service (as an f-API will be repeatedly making requests to itself).

To add one or more local nodes to the list of nodes known to your f-API, simply add more dictionaries to local_nb_nodes.json.

Behind a reverse proxy

These steps are for advanced users and production deployments

To make your Neurobagel node services (node API, query tool, etc.) accessible via custom URLs (e.g. https://www.myfirstnode.org/query) rather than a server IP address and port (e.g. http://192.168.0.1:3000) as shown in in the getting started guide, you will need to set up a reverse proxy such as NGINX or Caddy. This will route incoming requests for custom URLs to the Neurobagel services deployed on your server.

The Neurobagel recipes repository includes pre-configured Docker Compose files for both NGINX and Caddy, each of which can be used to launch a reverse proxy server alongside the services in your Neurobagel node. The reverse proxy setup will then automatically handle routing and also manage and renew SSL certificates (providing secure HTTPS connections) for node services.

If you haven't already, follow the steps to clone and minimally configure the services in the Neurobagel deployment recipe.
Ensure you have already registered your desired domain(s) with a DNS provider and configured the DNS settings to resolve correctly to your host machine.
Make sure that ports 80 and 443 are open on the host machine where your Docker Compose stack is running because these are the ports your reverse proxy will listen on for incoming HTTP and HTTPS traffic.

NGINXCaddy

In your local docker-compose-nginx.yml file, change the default value for the following variables in the environment section of each service (i.e. api, federation, and query_federation) to the custom domain that that service will use:
- VIRTUAL_HOST
- LETSENCRYPT_HOST (both variables should have the same value)
Do not include subpaths in the _HOST variables

If you intend to host services on different subpaths (e.g., myinstitute.org/service1) instead of different subdomains (e.g., service1.myinstitute.org), do not include the subpath in the VIRTUAL_HOST or LETSENCRYPT_HOST value. Instead, update the *_BASE_PATH variables in the .env file for the respective services, e.g., NB_NAPI_BASE_PATH for the node API (see the .env docs for more details)

Do not change the VIRTUAL_PATH and VIRTUAL_PORT variables

You can look at the NGINX-Proxy documentation to learn more about how these variables work.
In your .env file, set the value of NB_API_QUERY_URL to the new URL of the federation API including subpath if applicable (e.g. myinstitute.org/federate)
Finally, launch your node by explicitly referencing the custom Docker Compose file:
```
docker compose -f docker-compose-nginx.yml up -d
```

You do not need to edit the docker-compose-caddy.yml file directly.

In your local recipes/config/caddy/Caddyfile, change the default URL for each service to the URL you want to use for that service. Follow the comments in the file for guidance.

For more complex reverse proxy setups, refer to the Caddy documentation

The Caddy documentation has more detailed information on subdirectory routing and other configuration options.

Finally, launch your node by explicitly referencing the custom Docker Compose file:
```
docker compose -f docker-compose-caddy.yml up -d
```