Skip to content

The Neurobagel Annotation Tool

The Neurobagel annotation tool creates standardized, machine-readable data dictionaries for tabular data using curated FAIR vocabularies. The tool helps to harmonize tabular research data and is compatible with BIDS datasets.

Workflow summary:

  1. Upload tabular data
  2. Column annotation
  3. Value annotation
  4. Download data dictionary

1. Upload tabular data

Annotation tool upload step screenshot

  • Upload your data table (.tsv file)
    • Can be participants.tsv from a BIDS dataset
  • Optional: Upload an existing data dictionary (.json file) for extra context
    • Can use participants.json from a BIDS dataset
    • Or continue previous Neurobagel annotation work

In the following steps, you will annotate your table by first describing the columns and then the values within the columns.

2. Column Annotation

Annotation tool column annotation step screenshot - non-assessment variables

Each column in your uploaded table is represented as a card on the left side of this page. Select a column to annotate it.

Tip

To select multiple columns, hold down Shift , Ctrl , or Cmd .

For each column, you can:

  • Add a description
  • Select the standardized variable that best describes the column (if a suitable match exists)

    • First select the column(s) to map, and then select the corresponding variable name from the "Standardized Variables" list on the right
    • The number next to a standardized variable () indicates how many columns are currently mapped to it

    Some standardized variables only allow 1 mapped column

    A standardized variable with a limit of 1 mapped column will appear disabled in the right sidebar once a column has been mapped to it. To map a different column to the variable, first clear the mapping for the old column.

  • Select the assessment tool used to collect the column's data (if a suitable match exists)

    • If a column contains information from an assessment, you can map it to a corresponding assessment term from the "Assessment Tool" list on the right
    • First select the column(s) about the assessment, and then select the corresponding assessment term

    Can't find a matching assessment term?

    The available assessment vocabulary may not cover all assessments collected in your data. If no suitable match exists, consider providing the full assessment name in the "Description" field of the relevant column(s) instead, for future reference.

Annotation tool column annotation step screenshot - collection variables

Mapping columns to an assessment tool

  • Select the data type

    • To indicate the data type of the selected column(s), use the "Map Data Type" buttons above the columns table
    • Choose "Categorical" if the column contains discrete values, "Continuous" if it contains numerical measurements, or leave it empty if neither applies
    • Some standardized variables automatically set the data type of any column mapped to them

    When to manually annotate data type

    We recommend manually selecting the data type in two cases:

    1. When your column doesn't match any standardized variable
    2. When your column corresponds to an assessment tool (since a single assessment can be represented by multiple columns with different data types, no default data type is assumed)

If your dataset has imaging (BIDS) data

The "Participant ID" standardized variable must be mapped to a column that contains the BIDS IDs for subjects, following the BIDS naming scheme sub-<label>.

For more information, see this section on preparing the phenotypic data table for a BIDS dataset.

3. Value Annotation

Annotation tool value annotation step screenshot

The left sidebar displays the standardized variables that are represented in your tabular data, along with the column names that have been mapped to those variables.

Click on a standardized variable (or data type, for unannotated columns) subheading in the sidebar to display the columns corresponding to that variable (or data type). Then, in the column-level view on the right, navigate between the column tabs to annotate the values within each column.

Understanding sidebar sections

The sidebar organizes your columns by their annotation status:

  • Annotated contains columns you have mapped to standardized variables
  • Unannotated contains columns you have not mapped to a standardized variable
    • Within this section, unannotated columns are organized based on whether you have assigned them a data type

Columns with continuous data

For a column containing continuous data, you can:

  • Add a description of the units of measurement
  • Select the format of the numerical values1
  • Select "Mark as missing" for any values that represent missing, unavailable, or invalid data1
    • Note: the column-level view will only display unique values in the column
Units vs. Format

Format refers to how the numeric values in your data are expressed (e.g., float for decimal numbers like 25.5, range for numeric ranges like 30-35) whereas Units describe what the numbers represent (e.g., "years" for age, "points" for test scores, "mg/dL" for measurements).

Columns with categorical data

For a column containing categorical data, you will be prompted to annotate the unique values detected in the column. This includes any values that are blank (empty strings) or contain only whitespace.

For each unique column value, you can:

  • Add a free-form description of the value
  • Select a standardized term that best captures the meaning of the value1
  • Select "Mark as missing" if the value:1
    • indicates missing, unavailable, or invalid data
    • OR, does not have a suitable match among the standardized term options

Warning

For the value annotation to be considered complete by Neurobagel, all unique values must either be mapped to a standardized term or marked as missing.

4. Download data dictionary

Annotation tool download step screenshot

  • Preview your annotated data dictionary
  • Download the data dictionary .json file
  • Annotate a new dataset if desired

Tip

If you see a warning about "Incomplete Annotations", you will need to return to the Value Annotation page to complete any missing annotations before your data dictionary is valid for downstream Neurobagel tools.

Your downloaded data dictionary is BIDS-compatible and, if you see the confirmation that you have successfully created a Neurobagel data dictionary, it is ready to be used to generate data for a Neurobagel graph database.


  1. Attribute can only be annotated if the column has been mapped to a standardized variable.