The Neurobagel CLI

The Neurobagel CLI is a command-line tool that processes a Neurobagel-annotated dataset and produces harmonized subject-level phenotypic and imaging attributes. The resulting harmonized data can be directly integrated into a Neurobagel graph store.

Installation

PythonDockerApptainer

The Neurobagel CLI can be installed from PyPI using pip.

Before installing the Python package, we recommend first creating and activating a Python virtual environment (using a tool such as venv).
Install the bagel package into your virtual environment:
```
pip install bagel
```

Pull the Docker image for the Neurobagel CLI from Docker Hub:

docker pull neurobagel/bagelcli

Build a Apptainer image for the Neurobagel CLI using the Docker Hub image:

apptainer pull bagel.sif docker://neurobagel/bagelcli

Input files

The Neurobagel CLI creates a single harmonized view of each subject's data in a dataset, and can integrate information from several data sources (phenotypic, raw neuroimaging, processed neuroimaging).

To run the CLI on a dataset, you will need the following files:

A phenotypic TSV
A Neurobagel JSON data dictionary for the TSV
(Optional) A valid BIDS metadata table, if subjects have neuroimaging data available (1)
(Optional) A TSV of subject statuses for any image processing pipelines that have been run, following the Nipoppy processing status file schema (2)

This table can be generated automatically using the CLI's bids2tsv command, and will be used to generate harmonized subject imaging data availability.
This file is adapted from the Nipoppy workflow and can be automatically generated using Nipoppy pipeline trackers. It will be used to generate harmonized subject processed imaging data availability.

Running the CLI

To view the general CLI help and information about the available commands:

PythonDockerApptainer

bagel -h

# This is a shorthand for: docker run --rm neurobagel/bagelcli --help
docker run --rm neurobagel/bagelcli

# This is a shorthand for: apptainer run bagel.sif --help
apptainer run bagel.sif

Generate a BIDS metadata table

Info

If your dataset does not have imaging data, skip this step.
If your dataset's imaging data are not in BIDS format, you must manually create a BIDS metadata table.

To include BIDS imaging data as part of the harmonized subject data, you must first convert the BIDS metadata into a table.

You can do this automatically using the CLI's bids2tsv command.¹

Example:

If your BIDS directory is located at /data/public/Dataset1_bids and you want the table output to be saved to /home/Neurobagel:

PythonDockerApptainer

bagel bids2tsv \
    --bids-dir "/data/public/Dataset1_bids"
    --output "/home/Neurobagel/Dataset1_bids.tsv"

docker run --rm \
    -v "/data/public:/data/public" \
    -v "/home/Neurobagel:/home/Neurobagel" \ 
    neurobagel/bagelcli bids2tsv \
    --bids-dir "/data/public/Dataset1_bids" \
    --output "/home/Neurobagel/Dataset1_bids.tsv"

Mounting input paths using -v/--volume

When running the CLI in a container, you must mount any input or output directories to directory paths within the container so that the app can access them. In your CLI options, always refer to the container paths. In the example above, container paths are set to match the host paths for simplicity.

apptainer run --no-home \
    -B "/data/public,/home/Neurobagel" \
    bagel.sif bids2tsv \
    --bids-dir "/data/public/Dataset1_bids" \
    --output "/home/Neurobagel/Dataset1_bids.tsv"

Mounting input paths using -B/--bind

When running the CLI in a container, you must mount any input or output directories to directory paths within the container so that the app can access them. In your CLI options, always refer to the container paths. In the example above, the container paths are set to match the host paths for simplicity.

This command may be slow on large datasets

On datasets with more than a few hundred subjects, bids2tsv can take upwards of several minutes due to the time needed for PyBIDS to read the dataset structure.

This will produce a BIDS metadata table named Dataset1_bids.tsv, which can then be provided as input to the bids command below.

Generate graph-ready data (JSONLD files)

The Neurobagel CLI provides different commands for generating different types of harmonized subject (meta)data:

pheno

Must be run first

Each subject in a Neurobagel graph requires at least phenotypic data. The other metadata are optional and can be added afterward via the bids and/or derivatives commands in any order.
bids
derivatives

If you are using Docker or Apptainer, we strongly recommend placing all the input files for your dataset into a single directory. This avoids the need to mount multiple paths into the container when running CLI commands.

Viewing help for a command

To view the command-line options for a specific command, such as pheno:

PythonDockerApptainer

bagel pheno -h

docker run --rm neurobagel/bagelcli pheno -h

apptainer run bagel.sif pheno -h

Example:

The following example assumes that the input files for your dataset are located in /home/Dataset1/Neurobagel:

home/
└── Dataset1/
    ├── Neurobagel/
    │   ├── Dataset1_pheno.tsv # (1)!
    │   ├── Dataset1_pheno.json # (2)!
    │   ├── Dataset1_bids.tsv # (3)!
    │   ├── Dataset1_proc_status.tsv # (4)!
    │   └── ...
    └── ...

The phenotypic TSV
The phenotypic data dictionary
The BIDS metadata table
The processing status file

Navigate to the directory containing your input files, e.g.:

cd /home/Dataset1/Neurobagel

Info

In the example commands below, replace the Dataset1 files with the actual input files for your dataset.

1. Process phenotypic data using the `pheno` command (required)

Run the command below to generate harmonized subject-level phenotypic data for your dataset as a JSONLD file:

PythonDockerApptainer

bagel pheno \
    --pheno "Dataset1_pheno.tsv" \
    --dictionary "Dataset1_pheno.json" \
    --name "Dataset 1" \
    --portal "https://www.mydatasetportal.org/dataset1" \ # (1)!
    --output "Dataset1.jsonld"

The website/URL you enter here will be shown as a clickable link when this dataset is discovered in the query tool

docker run --rm -v $PWD:$PWD neurobagel/bagelcli pheno \
    --pheno "$PWD/Dataset1_pheno.tsv" \
    --dictionary "$PWD/Dataset1_pheno.json" \
    --name "Dataset 1" \
    --portal "https://www.mydatasetportal.org/dataset1" \ # (1)!
    --output "$PWD/Dataset1.jsonld"

The website/URL you enter here will be shown as a clickable link when this dataset is discovered in the query tool

apptainer run --no-home -B $PWD bagel.sif pheno \
    --pheno "$PWD/Dataset1_pheno.tsv" \
    --dictionary "$PWD/Dataset1_pheno.json" \
    --name "Dataset 1" \
    --portal "https://www.mydatasetportal.org/dataset1" \ # (1)!
    --output "$PWD/Dataset1.jsonld"

The website/URL you enter here will be shown as a clickable link when this dataset is discovered in the query tool

2. Process raw imaging metadata using the `bids` command (optional)

If you have a BIDS metadata table, run this command to include subjects' imaging data availability to your dataset JSONLD file:

PythonDockerApptainer

bagel bids \
    --jsonld-path "Dataset1.jsonld" \
    --bids-table "Dataset1_bids.tsv" \
    --output "Dataset1.jsonld" \
    --overwrite

docker run --rm -v $PWD:$PWD neurobagel/bagelcli bids \
    --jsonld-path "$PWD/Dataset1.jsonld" \
    --bids-table "$PWD/Dataset1_bids.tsv" \
    --output "$PWD/Dataset1.jsonld" \
    --overwrite

apptainer run --no-home -B $PWD bagel.sif bids \
    --jsonld-path "$PWD/Dataset1.jsonld" \
    --bids-table "$PWD/Dataset1_bids.tsv" \
    --output "$PWD/Dataset1.jsonld" \
    --overwrite

3. Process derived imaging metadata using the `derivatives` command (optional)

If you have a processing status file from Nipoppy, run this command to add subjects' processing pipeline data availability to the dataset JSONLD:

PythonDockerApptainer

bagel derivatives \
    --jsonld-path "Dataset1.jsonld" \
    --tabular "Dataset1_proc_status.tsv" \
    --output "Dataset1.jsonld" \
    --overwrite

docker run --rm --v $PWD:$PWD neurobagel/bagelcli derivatives \
    --jsonld-path "$PWD/Dataset1.jsonld" \
    --tabular "$PWD/Dataset1_proc_status.tsv" \
    --output "$PWD/Dataset1.jsonld" \
    --overwrite

apptainer run --no-home -B $PWD bagel.sif derivatives \
    --jsonld-path "$PWD/Dataset1.jsonld" \
    --tabular "$PWD/Dataset1_proc_status.tsv" \
    --output "$PWD/Dataset1.jsonld" \
    --overwrite

Tip

To see all options for a CLI command, including short forms and optional parameters, refer to the command's help.

When to use -f/--overwrite

If you're only interested in the final JSONLD with all metadata added (i.e., after all relevant commands have been run), you can safely overwrite intermediate output files by specifying the same output file path each time.

These steps have generated a graph-ready JSONLD file for Dataset1 (Dataset1.jsonld) that incorporates all the available subject data sources. The resulting JSONLD is ready to upload to a Neurobagel graph database.

Troubleshooting

`File or directory does not exist` error when using Docker/Apptainer

This error usually means the container cannot access your input files because the directories were not mounted correctly.

The examples assume you are running the CLI from inside the directory containing your inputs. Thus, they mount the current working directory $PWD to the same path inside the container for convenience using the syntax:

DockerApptainer

docker run --rm -v $PWD:$PWD neurobagel/bagelcli ...

However, if your inputs are located in a different directory or spread across multiple directories, you must mount each directory explicitly using the Docker option -v /path/on/host:/path/in/container.

When passing file paths to the CLI, always use the absolute path inside the container to avoid confusion.

apptainer run --no-home -B $PWD bagel.sif ...

However, if your inputs are located in a different directory or spread across multiple directories, you must mount each directory explicitly using the Apptainer option -B /path/on/host:/path/in/container.

When passing file paths to the CLI, always use the absolute path inside the container to avoid confusion.

Upgrading data to a newer version of the CLI

Neurobagel is under active development and future CLI releases may introduce breaking changes to the data model used in subject-level .jsonld graph files. Breaking changes are highlighted in the release notes.

To upgrade to the latest version of the data model:

Upgrade to the latest CLI version:

PythonDockerApptainer

pip install --upgrade bagel

docker pull neurobagel/bagelcli

apptainer pull bagel.sif docker://neurobagel/bagelcli

If you have an existing Neurobagel graph database, we recommend regenerating and reuploading all existing .jsonld files in your database using the latest CLI version. This keeps the database internally consistent and avoids conflicts with dataset .jsonld files generated using older CLI versions.

bids2tsv internally uses bids2table. ↩

The Neurobagel CLI

Installation

Input files

Running the CLI

Generate a BIDS metadata table

Generate graph-ready data (JSONLD files)

Viewing help for a command

1. Process phenotypic data using the pheno command (required)

2. Process raw imaging metadata using the bids command (optional)

3. Process derived imaging metadata using the derivatives command (optional)

Troubleshooting

File or directory does not exist error when using Docker/Apptainer

Upgrading data to a newer version of the CLI

1. Process phenotypic data using the `pheno` command (required)

2. Process raw imaging metadata using the `bids` command (optional)

3. Process derived imaging metadata using the `derivatives` command (optional)

`File or directory does not exist` error when using Docker/Apptainer