The Neurobagel Annotation Tool
The Neurobagel annotation tool creates standardized, machine-readable data dictionaries for tabular data using curated FAIR vocabularies. The tool helps to harmonize tabular research data and is compatible with BIDS datasets.
Workflow summary:
- Upload tabular data
- Column annotation
- Value annotation
- Download data dictionary
1. Upload tabular data
- Upload your data table (.tsv file)
- Can be
participants.tsv
from a BIDS dataset
- Can be
- Optional: Upload an existing data dictionary (.json file) for extra context
- Can use
participants.json
from a BIDS dataset - Or continue previous Neurobagel annotation work
- Can use
In the following steps, you will annotate your table by first describing the columns and then the values within the columns.
2. Column Annotation
Each column in your uploaded table is represented as a card on this page. For each column, you can:
- Add a description
- Select the standardized variable that best describes the column from the dropdown (if a suitable match exists)
- Select the data type
- Choose "Categorical" if the column contains discrete values, "Continuous" if it contains numerical measurements, or leave it empty if neither applies
- Columns mapped to standardized variables will have their data type inferred automatically
When to manually select data type
We recommend manually selecting the data type in two cases:
- When your column doesn't match any standardized variable
- When your column matches the "Assessment tool" standardized variable (which does not have a predefined data type since it can represent multi-column measures)
2.1 Multi-column measure annotation
Info
This step is only available if you have mapped columns in your data table to the "Assessment tool" standardized variable.
The card on the right lists all columns from your data table that you have mapped to the "Assessment tool" standardized variable.
- Create a card for each assessment or instrument represented in your data by clicking and then selecting the name of the assessment from the dropdown list.
- If no suitable match exists, the available standardized vocabulary likely cannot currently represent your assessment.
- To avoid incomplete annotations, un-map any column(s) corresponding to missing assessments from the "Assessment tool" standardized variable using the button in the overview card.
- Select the column(s) that describe each assessment, grouping together related columns as needed, using the dropdown on the respective assessment card.
- You can check remaining, ungrouped columns in the overview on the right.
3. Value Annotation
The left sidebar displays the standardized variables that are represented in your tabular data, along with the column names that have been mapped to those variables.
Click on a standardized variable (or data type, for unannotated columns) subheading in the sidebar to display the columns corresponding to that variable (or data type). Then, in the column-level view on the right, navigate between the column tabs to annotate the values within each column.
Understanding sidebar sections
The sidebar organizes your columns by their annotation status:
- Annotated contains columns you have mapped to standardized variables
- Unannotated contains columns you have not mapped to a standardized variable
- Within this section, unannotated columns are organized based on whether you have assigned them a data type
Columns with continuous data
For a column containing continuous data, you can:
- Add a description of the units of measurement
- Select the format of the numerical values1
- Select "Mark as missing" for any values that represent missing, unavailable, or invalid data1
- Note: the column-level view will only display unique values in the column
Units vs. Format
Format refers to how the numbers in your data are structured (e.g., float
for decimal numbers like 25.5, int
for whole numbers like 25) whereas Units describe what the numbers represent (e.g., "years" for age, "points" for test scores, "mg/dL" for measurements).
Columns with categorical data
For a column containing categorical data, you will be prompted to annotate the unique values detected in the column. This includes any values that are blank (empty strings) or contain only whitespace.
For each unique column value, you can:
- Add a free-form description of the value
- Select a standardized term that best captures the meaning of the value1
- Select "Mark as missing" if the value:1
- indicates missing, unavailable, or invalid data
- OR, does not have a suitable match among the standardized term options
Warning
For the value annotation to be considered complete by Neurobagel, all unique values must either be mapped to a standardized term or marked as missing.
4. Download data dictionary
- Preview your annotated data dictionary
- Download the data dictionary
.json
file - Annotate a new dataset if desired
Tip
If you see a warning about "Incomplete Annotations", you will need to return to the Value Annotation page to complete any missing annotations before your data dictionary is valid for downstream Neurobagel tools.
Your downloaded data dictionary is BIDS-compatible and, if you see the confirmation that you have successfully created a Neurobagel data dictionary, it is ready to be used to generate data for a Neurobagel graph database.