The Neurobagel Annotation Tool
The Neurobagel annotation tool creates standardized, machine-readable data dictionaries for tabular data using curated FAIR vocabularies. The tool helps to harmonize tabular research data and is compatible with BIDS datasets.
Workflow summary:
- Upload tabular data
- Column annotation
- Value annotation
- Download data dictionary
1. Upload tabular data

- Upload your data table (.tsv file)
- Can be
participants.tsvfrom a BIDS dataset
- Can be
- Optional: Upload an existing data dictionary (.json file) for extra context
- Can use
participants.jsonfrom a BIDS dataset - Or continue previous Neurobagel annotation work
- Can use
In the following steps, you will annotate your table by first describing the columns and then the values within the columns.
2. Column Annotation

Each column in your uploaded table is represented as a card on the left side of this page. Select a column to annotate it.
Tip
To select multiple columns, hold down Shift , Ctrl , or Cmd .
For each column, you can:
- Add a description
-
Select the standardized variable that best describes the column (if a suitable match exists)
- First select the column(s) to map, and then select the corresponding variable name from the "Standardized Variables" list on the right
- The number next to a standardized variable () indicates how many columns are currently mapped to it
Some standardized variables only allow 1 mapped column
A standardized variable with a limit of 1 mapped column will appear disabled in the right sidebar once a column has been mapped to it. To map a different column to the variable, first clear the mapping for the old column.
-
Select the assessment tool used to collect the column's data (if a suitable match exists)
- If a column contains information from an assessment, you can map it to a corresponding assessment term from the "Assessment Tool" list on the right
- First select the column(s) about the assessment, and then select the corresponding assessment term
Can't find a matching assessment term?
The available assessment vocabulary may not cover all assessments collected in your data. If no suitable match exists, consider providing the full assessment name in the "Description" field of the relevant column(s) instead, for future reference.

Mapping columns to an assessment tool
-
Select the data type
- To indicate the data type of the selected column(s), use the "Map Data Type" buttons above the columns table
- Choose "Categorical" if the column contains discrete values, "Continuous" if it contains numerical measurements, or leave it empty if neither applies
- Some standardized variables automatically set the data type of any column mapped to them
When to manually annotate data type
We recommend manually selecting the data type in two cases:
- When your column doesn't match any standardized variable
- When your column corresponds to an assessment tool (since a single assessment can be represented by multiple columns with different data types, no default data type is assumed)
If your dataset has imaging (BIDS) data
The "Participant ID" standardized variable must be mapped to a column that contains the BIDS IDs for subjects, following the BIDS naming scheme sub-<label>.
For more information, see this section on preparing the phenotypic data table for a BIDS dataset.
3. Value Annotation

The left sidebar displays the standardized variables that are represented in your tabular data, along with the column names that have been mapped to those variables.
Click on a standardized variable (or data type, for unannotated columns) subheading in the sidebar to display the columns corresponding to that variable (or data type). Then, in the column-level view on the right, navigate between the column tabs to annotate the values within each column.
Understanding sidebar sections
The sidebar organizes your columns by their annotation status:
- Annotated contains columns you have mapped to standardized variables
- Unannotated contains columns you have not mapped to a standardized variable
- Within this section, unannotated columns are organized based on whether you have assigned them a data type
Columns with continuous data
For a column containing continuous data, you can:
- Add a description of the units of measurement
- Select the format of the numerical values1
- Select "Mark as missing" for any values that represent missing, unavailable, or invalid data1
- Note: the column-level view will only display unique values in the column
Units vs. Format
Format refers to how the numeric values in your data are expressed (e.g., float for decimal numbers like 25.5, range for numeric ranges like 30-35) whereas Units describe what the numbers represent (e.g., "years" for age, "points" for test scores, "mg/dL" for measurements).
Columns with categorical data
For a column containing categorical data, you will be prompted to annotate the unique values detected in the column. This includes any values that are blank (empty strings) or contain only whitespace.
For each unique column value, you can:
- Add a free-form description of the value
- Select a standardized term that best captures the meaning of the value1
- Select "Mark as missing" if the value:1
- indicates missing, unavailable, or invalid data
- OR, does not have a suitable match among the standardized term options
Warning
For the value annotation to be considered complete by Neurobagel, all unique values must either be mapped to a standardized term or marked as missing.
4. Download data dictionary

- Preview your annotated data dictionary
- Download the data dictionary
.jsonfile - Annotate a new dataset if desired
Tip
If you see a warning about "Incomplete Annotations", you will need to return to the Value Annotation page to complete any missing annotations before your data dictionary is valid for downstream Neurobagel tools.
Your downloaded data dictionary is BIDS-compatible and, if you see the confirmation that you have successfully created a Neurobagel data dictionary, it is ready to be used to generate data for a Neurobagel graph database.