BRIDGE2AI

Bridge2AI Voice Announces Major Dataset Releases for Adult and Pediatric Voice Research

December 22, 2025

The NIH Common Fund Bridge2AI Program’s Precision Public Health Grand Challenge is pleased to announce two significant milestones that advance the use of voice as a biomarker in biomedical artificial intelligence research: the release of Bridge2AI Voice’s largest adult dataset to date and the first-ever Bridge2AI Voice pediatric dataset.

Bridge2AI-Voice Adult Dataset v3.0.0

The Voice team released Bridge2AI-Voice v3.0.0, the most comprehensive version of their adult voice dataset. This release includes data from 833 participants across five sites in North America, selected based on conditions known to manifest in voice characteristics, including voice, neurological, mood, and respiratory disorders.

Voice is a promising biomarker for health due to its ease of collection, low cost, and broad clinical applicability. Advances in AI now enable extraction of previously unrecognized prognostic signals from complex data such as audio. Bridge2AI-Voice was created to provide an ethically sourced, high-quality flagship dataset to support this emerging area of research.

The v3.0.0 release contains low-risk derived data, including audio features and spectrograms, along with detailed demographic, clinical, and validated questionnaire data. Original audio recordings are not included in this version.

Bridge2AI Voice Pediatric Dataset v1.0.0

In a major expansion of the program, the Voice team is also introducing the first Bridge2AI Voice Pediatric Dataset (v1.0.0). This dataset addresses a critical gap in pediatric voice data and enables new research into developmental and health-related voice characteristics.

The pediatric release includes derived audio features from 22,620 recordings collected from 300 participants aged 2–18. As with the adult dataset, the release contains low-risk derived data (such as spectrograms), detailed demographic and clinical information, and validated questionnaires, but does not include original audio recordings.

Data Access

Both datasets are available on PhysioNet via registered access. Versions containing audio data are available under controlled access. For more information about controlled access data, please contact daco@b2ai-voice.org.

Learn more: Explore the datasets on the Bridge2AI Voice website or browse other Bridge2AI datasets.