The Neurobagel CLI
The Neurobagel CLI is a command-line tool that processes a Neurobagel-annotated dataset and produces harmonized subject-level phenotypic and imaging attributes. The resulting harmonized data can be directly integrated into a Neurobagel graph store.
Installation
The Neurobagel CLI can be installed from PyPI using pip
.
-
Before installing the Python package, we recommend first creating and activating a Python virtual environment (using a tool such as venv).
-
Install the
bagel
package into your virtual environment:pip install bagel
Pull the Docker image for the Neurobagel CLI from Docker Hub:
docker pull neurobagel/bagelcli
Build a Singularity image for the Neurobagel CLI using the Docker Hub image:
singularity pull bagel.sif docker://neurobagel/bagelcli
Input files
The Neurobagel CLI creates a single harmonized view of each subject's data in a dataset, and can integrate information from several data sources (phenotypic, raw neuroimaging, processed neuroimaging).
To run the CLI on a dataset, you will need the following files:
- A phenotypic TSV
- A Neurobagel JSON data dictionary for the TSV
- (Optional) A valid BIDS metadata table, if subjects have neuroimaging data available (1)
- (Optional) A TSV of subject statuses for any image processing pipelines that have been run, following the Nipoppy processing status file schema (2)
- This table can be generated automatically using the CLI's
bids2tsv
command, and will be used to generate harmonized subject imaging data availability. - This file is adapted from the Nipoppy workflow and can be automatically generated using Nipoppy pipeline trackers. It will be used to generate harmonized subject processed imaging data availability.
Running the CLI
To view the general CLI help and information about the available commands:
bagel -h
# This is a shorthand for: docker run --rm neurobagel/bagelcli --help
docker run --rm neurobagel/bagelcli
# This is a shorthand for: singularity run bagel.sif --help
singularity run bagel.sif
Generate a BIDS metadata table
Info
- If your dataset does not have imaging data, skip this step.
- If your dataset's imaging data are not in BIDS format, you must manually create a BIDS metadata table.
To include BIDS imaging data as part of the harmonized subject data, you must first convert the BIDS metadata into a table.
You can do this automatically using the CLI's bids2tsv
command.1
Example:
If your BIDS directory is located at /data/public/Dataset1_bids
and you want the table output to be saved to /home/Neurobagel
:
bagel bids2tsv \
--bids-dir "/data/public/Dataset1_bids"
--output "/home/Neurobagel/Dataset1_bids.tsv"
docker run --rm \
-v "/data/public:/data/public" \
-v "/home/Neurobagel:/home/Neurobagel" \
neurobagel/bagelcli bids2tsv \
--bids-dir "/data/public/Dataset1_bids" \
--output "/home/Neurobagel/Dataset1_bids.tsv"
Mounting input paths using -v
/--volume
When running the CLI in a container, you must mount any input or output directories to directory paths within the container so that the app can access them. In your CLI options, always refer to the container paths. In the example above, container paths are set to match the host paths for simplicity.
singularity run --no-home \
-B "/data/public,/home/Neurobagel" \
bagel.sif bids2tsv \
--bids-dir "/data/public/Dataset1_bids" \
--output "/home/Neurobagel/Dataset1_bids.tsv"
Mounting input paths using -B
/--bind
When running the CLI in a container, you must mount any input or output directories to directory paths within the container so that the app can access them. In your CLI options, always refer to the container paths. In the example above, the container paths are set to match the host paths for simplicity.
This command may be slow on large datasets
On datasets with more than a few hundred subjects, bids2tsv
can take upwards of several minutes
due to the time needed for PyBIDS
to read the dataset structure.
This will produce a BIDS metadata table named Dataset1_bids.tsv
, which can then be provided as input to the bids
command below.
Generate graph-ready data (JSONLD files)
The Neurobagel CLI provides different commands for generating different types of harmonized subject (meta)data:
-
pheno
Must be run first
Each subject in a Neurobagel graph requires at least phenotypic data. The other metadata are optional and can be added afterward via the
bids
and/orderivatives
commands in any order. -
bids
derivatives
If you are using Docker or Singularity, we strongly recommend placing all the input files for your dataset into a single directory. This avoids the need to mount multiple paths into the container when running CLI commands.
Viewing help for a command
To view the command-line options for a specific command, such as pheno
:
bagel pheno -h
docker run --rm neurobagel/bagelcli pheno -h
singularity run bagel.sif pheno -h
Example:
The following example assumes that the input files for your dataset are located in /home/Dataset1/Neurobagel
:
home/
└── Dataset1/
├── Neurobagel/
│ ├── Dataset1_pheno.tsv # (1)!
│ ├── Dataset1_pheno.json # (2)!
│ ├── Dataset1_bids.tsv # (3)!
│ ├── Dataset1_proc_status.tsv # (4)!
│ └── ...
└── ...
- The phenotypic TSV
- The phenotypic data dictionary
- The BIDS metadata table
- The processing status file
Navigate to the directory containing your input files, e.g.:
cd /home/Dataset1/Neurobagel
Info
In the example commands below, replace the Dataset1 files with the actual input files for your dataset.
1. Process phenotypic data using the pheno
command (required)
Run the command below to generate harmonized subject-level phenotypic data for your dataset as a JSONLD file:
bagel pheno \
--pheno "Dataset1_pheno.tsv" \
--dictionary "Dataset1_pheno.json" \
--name "Dataset 1" \
--output "Dataset1.jsonld"
docker run --rm -v $PWD:$PWD neurobagel/bagelcli pheno \
--pheno "$PWD/Dataset1_pheno.tsv" \
--dictionary "$PWD/Dataset1_pheno.json" \
--name "Dataset 1" \
--output "$PWD/Dataset1.jsonld"
singularity run --no-home -B $PWD bagel.sif pheno \
--pheno "$PWD/Dataset1_pheno.tsv" \
--dictionary "$PWD/Dataset1_pheno.json" \
--name "Dataset 1" \
--output "$PWD/Dataset1.jsonld"
2. Process raw imaging metadata using the bids
command (optional)
If you have a BIDS metadata table, run this command to include subjects' imaging data availability to your dataset JSONLD file:
bagel bids \
--jsonld-path "Dataset1.jsonld" \
--bids-table "Dataset1_bids.tsv" \
--output "Dataset1.jsonld" \
--overwrite
docker run --rm -v $PWD:$PWD neurobagel/bagelcli bids \
--jsonld-path "$PWD/Dataset1.jsonld" \
--bids-table "$PWD/Dataset1_bids.tsv" \
--output "$PWD/Dataset1.jsonld" \
--overwrite
singularity run --no-home -B $PWD bagel.sif bids \
--jsonld-path "$PWD/Dataset1.jsonld" \
--bids-table "$PWD/Dataset1_bids.tsv" \
--output "$PWD/Dataset1.jsonld" \
--overwrite
3. Process derived imaging metadata using the derivatives
command (optional)
If you have a processing status file from Nipoppy, run this command to add subjects' processing pipeline data availability to the dataset JSONLD:
bagel derivatives \
--jsonld-path "Dataset1.jsonld" \
--tabular "Dataset1_proc_status.tsv" \
--output "Dataset1.jsonld" \
--overwrite
docker run --rm --v $PWD:$PWD neurobagel/bagelcli derivatives \
--jsonld-path "$PWD/Dataset1.jsonld" \
--tabular "$PWD/Dataset1_proc_status.tsv" \
--output "$PWD/Dataset1.jsonld" \
--overwrite
singularity run --no-home -B $PWD bagel.sif derivatives \
--jsonld-path "$PWD/Dataset1.jsonld" \
--tabular "$PWD/Dataset1_proc_status.tsv" \
--output "$PWD/Dataset1.jsonld" \
--overwrite
Tip
To see all options for a CLI command, including short forms and optional parameters, refer to the command's help.
When to use -f
/--overwrite
If you're only interested in the final JSONLD with all metadata added (i.e., after all relevant commands have been run), you can safely overwrite intermediate output files by specifying the same output file path each time.
These steps have generated a graph-ready JSONLD file for Dataset1 (Dataset1.jsonld
) that incorporates all the available subject data sources.
The resulting JSONLD is ready to upload to a Neurobagel graph database.
Troubleshooting
File or directory does not exist
error when using Docker/Singularity
This error usually means the container cannot access your input files because the directories were not mounted correctly.
The examples assume you are running the CLI from inside the directory containing your inputs. Thus, they mount the current working directory $PWD
to the same path inside the container for convenience using the syntax:
docker run --rm -v $PWD:$PWD neurobagel/bagelcli ...
However, if your inputs are located in a different directory or spread across multiple directories, you must mount each directory explicitly using the Docker option -v /path/on/host:/path/in/container
.
When passing file paths to the CLI, always use the absolute path inside the container to avoid confusion.
singularity run --no-home -B $PWD bagel.sif ...
However, if your inputs are located in a different directory or spread across multiple directories, you must mount each directory explicitly using the Singularity option -B /path/on/host:/path/in/container
.
When passing file paths to the CLI, always use the absolute path inside the container to avoid confusion.
Upgrading data to a newer version of the CLI
Neurobagel is under active development and future CLI releases may introduce breaking changes to the data model used in subject-level .jsonld
graph files.
Breaking changes are highlighted in the release notes.
To upgrade to the latest version of the data model:
-
Upgrade to the latest CLI version:
pip install --upgrade bagel
docker pull neurobagel/bagelcli
singularity pull bagel.sif docker://neurobagel/bagelcli
-
If you have an existing Neurobagel graph database, we recommend regenerating and reuploading all existing
.jsonld
files in your database using the latest CLI version. This keeps the database internally consistent and avoids conflicts with dataset.jsonld
files generated using older CLI versions.
-
bids2tsv
internally uses bids2table. ↩