BRIDGE2AI

Data Generation Project-Bridge2AI Voice

Bridge2AI Voice Data Generation Project

Precision Public Health Grand Challenge

About the Project

The Bridge2AI Voice project aims to establish a flagship, ethically sourced, multi-institutional dataset that integrates voice as a biomarker of health with multimodal clinical data. By linking voice recordings to electronic health records, radiomics, genomics, and other health biomarkers, this dataset is designed to enable advanced AI research focused on improving screening, diagnosis, and treatment across diverse disease domains. Data collection is enabled through a secure smartphone application connected to clinical data systems, and employs federated learning techniques to ensure participant privacy while facilitating collaborative research.

The project focuses on five key disease categories where voice characteristics have demonstrated clinical relevance, including vocal pathologies such as laryngeal cancers and vocal fold paralysis, neurological and neurodegenerative disorders like Alzheimer’s disease and Parkinson’s disease, mood and psychiatric disorders including depression and bipolar disorder, respiratory conditions such as pneumonia and chronic obstructive pulmonary disease, as well as pediatric conditions like autism spectrum disorder and speech delay. Together, these efforts aim to create a foundational resource to accelerate AI-driven precision public health and clinical care.

The Bridge2AI-Voice Dataset

Bridge2AI-Voice Dataset Version 2.0

442

Participants

19,271

Recordings

16,738

Spectrograms

5

Clinical Sites

Videos

Introduction to the Precision Public Health Grand Challenge

Introduction to the Voice Dataset

More CM4AI

Publications