The Neurobagel CLI
The bagel-cli
is a simple Python command-line tool to automatically parse and describe
subject-level phenotypic and imaging attributes in an annotated dataset
for integration into the Neurobagel graph.
Installation
Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub:
docker pull neurobagel/bagelcli
Option 2: Clone the repository and build the Docker image locally:
git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .
Build a Singularity image for bagel-cli
using the DockerHub image:
singularity pull bagel.sif docker://neurobagel/bagelcli
Running the CLI
CLI commands can be accessed using the Docker/Singularity image.
Note
The Docker examples below assume that you are using the official Neurobagel Docker Hub image for the CLI.
If you have instead locally built an image, replace neurobagel/bagelcli
in commands with your built image tag.
Input files
The Neurobagel CLI can compile information from several different data sources to create a single harmonized representation of subject data. To run the CLI on a dataset, you will need:
- A phenotypic TSV
- A Neurobagel JSON data dictionary for the TSV
- (Optional) The imaging dataset in BIDS format, if subjects have imaging data available (1)
- (Optional) A TSV containing subject statuses for any image processing pipelines that have been run, following the Nipoppy processing status file schema (2)
- The CLI will use a valid BIDS dataset to generate harmonized raw imaging metadata for subjects.
- This file will be used by the CLI to generate harmonized processing pipeline and derivative metadata for subjects. It has compatibility with the Nipoppy workflow, and can be automatically generated using the Nipoppy pipeline trackers.
Viewing CLI commands and options
The bagel-cli
has different commands, each generating a different type of subject (meta)data:
pheno
bids
derivatives
The pheno
command must be run first on a dataset (each subject in a Neurobagel graph must have at least phenotypic information);
other metadata are optional and can be added in an arbitrary order.
To view the general CLI help and information about the commands:
# This is a shorthand for `docker run --rm neurobagel/bagelcli --help`
docker run --rm neurobagel/bagelcli
# This is a shorthand for `singularity run bagel.sif --help`
singularity run bagel.sif
To view the command-line arguments for a specific command (e.g., pheno
):
docker run --rm neurobagel/bagelcli pheno -h
singularity run bagel.sif pheno -h
Running the CLI on your data
cd
into your local directory containing your CLI input files (at minimum, a phenotypic TSV and corresponding Neurobagel annotated JSON data dictionary).- Run a
bagel-cli
container and include your CLI command and arguments at the end in the following format:
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <CLI command here>
What is this command doing?
The combination of options --volume=$PWD:$PWD -w $PWD
mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine).
This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)
singularity run --no-home --bind $PWD --pwd $PWD /path/to/bagel.sif <CLI command here>
What is this command doing?
The combination of options --bind $PWD --pwd $PWD
mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine).
This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)
Example
If your dataset lives in /home/data/Dataset1
:
home/
└── data/
└── Dataset1/
├── tabular/
│ ├── Dataset1_pheno.tsv
│ ├── Dataset1_pheno.json
│ └── ...
├── bids/
│ ├── sub-01/
│ ├── sub-02/
│ └── ...
├── derivatives/
│ ├── Dataset1_proc_status.tsv
│ └── ...
└── ...
Note
This is an example directory structure following the Nipoppy specification for dataset organization. Your input data may be organized differently.
To generate a single, graph-ready JSONLD file incorporating all subject data sources recognized by Neurobagel (Dataset1.jsonld
),
you could run the CLI as follows:
cd /home/data/Dataset1
# 1. Generate harmonized phenotypic data at the subject level
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli pheno \
--pheno "tabular/Dataset1_pheno.tsv" \
--dictionary "tabular/Dataset1_pheno.json" \
--name "My dataset 1" \
--output "Dataset1.jsonld"
# 2. Add subjects' BIDS data to the existing .jsonld
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli bids \
--jsonld-path "Dataset1.jsonld" \
--bids-dir "bids" \
--output "Dataset1.jsonld" \
--overwrite # (1)!
# 3. Add subjects' processing pipeline metadata to the existing .jsonld
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli derivatives \
--tabular "derivatives/Dataset1_proc_status.tsv" \
--jsonld-path "Dataset1.jsonld" \
--output "Dataset1.jsonld" \
--overwrite
- To keep outputs of different CLI commands as separate files, omit the
--overwrite
flag.
Tip
Short forms for a CLI command's options can be found by running:
docker run --rm neurobagel/bagelcli pheno --help
cd /home/data/Dataset1
# 1. Generate harmonized phenotypic data at the subject level
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif pheno \
--pheno "tabular/Dataset1_pheno.tsv" \
--dictionary "tabular/Dataset1_pheno.json" \
--name "My dataset 1" \
--output "Dataset1.jsonld"
# 2. Add subjects' BIDS data to the existing .jsonld
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif bids \
--jsonld-path "Dataset1.jsonld" \
--bids-dir "bids" \
--output "Dataset1.jsonld" \
--overwrite # (1)!
# 3. Add subjects' processing pipeline metadata to the existing .jsonld
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif derivatives \
--tabular "derivatives/Dataset1_proc_status.tsv" \
--jsonld-path "Dataset1.jsonld" \
--output "Dataset1.jsonld" \
--overwrite
- To keep outputs of different CLI commands as separate files, omit the
--overwrite
flag.
Tip
Short forms for a CLI command's options can be found by running:
singularity run bagel.sif pheno --help
Speed of the bids
command
The bids
command of the bagel-cli
(step 2) currently can take upwards of several minutes for datasets greater than a few hundred subjects, due to the time needed for pyBIDS to read the dataset structure.
Once the slow initial dataset reading step is complete, you should see the message:
BIDS parsing completed.
...
Upgrading to a newer version of the CLI
Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a .jsonld
graph file. Breaking changes will be highlighted in the release notes!
If you have already created .jsonld
files for your Neurobagel graph database using the CLI,
they can be quickly re-generated under the new data model by following the instructions here so that they will not conflict with dataset .jsonld
files generated using the latest CLI version.
Development environment
To ensure that our Docker images are built in a predictable way,
we use requirements.txt
as a lock-file.
That is, requirements.txt
includes the entire dependency tree of our tool,
with pinned versions for every dependency (see here for more information).
Setting up a local development environment
To work on the CLI, we suggest that you create a development environment that is as close as possible to the environment we run in production.
-
Install the dependencies from the lockfile (
dev_requirements.txt
):pip install -r dev_requirements.txt
-
Install the CLI without touching the dependencies:
pip install --no-deps -e .
-
Install the
bids-examples
andneurobagel_examples
submodules needed to run the test suite:git submodule init git submodule update
Confirm that everything works well by running the tests:
pytest .
Setting up code formatting and linting (recommended)
pre-commit is configured in the development environment for this repository, and can be set up to automatically run a number of code linters and formatters on any commit you make according to the consistent code style set for this project.
Run the following from the repository root to install the configured pre-commit "hooks" for your local clone of the repo:
pre-commit install
pre-commit will now run automatically whenever you run git commit
.
Updating Python lock-file
The requirements.txt
file is automatically generated from the setup.cfg
constraints. To update it, we use pip-compile
from the pip-tools
package.
Here is how you can use these tools to update the requirements.txt
file.
Note: pip-compile
will update dependencies based on the Python version of the environment it's running in.
- Ensure
pip-tools
is installed:pip install pip-tools
- Update the runtime dependencies in
requirements.txt
:pip-compile -o requirements.txt --upgrade
- The above command only updates the runtime dependencies.
Now, update the developer dependencies in
dev_requirements.txt
:pip-compile -o dev_requirements.txt --extra all --upgrade