BRIDGE2AI

Datasets

Datasets

Access comprehensive, FAIR-compliant data generated to meet the Grand Challenges and advance the future of health and behavior research.

AI/ML for Clinical Care CHoRUS

CHoRUS Dataset

Coming Soon

About the Dataset

The CHoRUS project is developing a flagship dataset to support AI/ML research focused on team-based clinical care. The dataset will be released in the future and is designed to support the development of responsible, real-world AI tools that enhance healthcare delivery.

Functional Genomics CM4AI

Cell Maps for Artificial Intelligence (CM4AI Dataset)

March 2025 Beta Release

About the Dataset

The CM4AI dataset delivers rich, multimodal cellular data designed to support AI research in precision medicine and drug response.

This Beta release includes:

  • Perturb-seq data in undifferentiated KOLF2.1J iPSCs
  • SEC-MS data in undifferentiated KOLF2.1J iPSCs and iPSC-derived NPCs, neurons, and cardiomyocytes
  • Immunofluorescence images in MDA-MB-468 breast cancer cells with and without chemotherapy (vorinostat and paclitaxel)

CM4AI datasets are packaged in RO-Crate format using the FAIRSCAPE framework, ensuring AI-readiness, traceable provenance, and rich metadata. Data will be continuously augmented through the end of the project.

Precision Public Health / Voice

Bridge2AI-Voice Dataset

Version 2.0 Release

About the Dataset

The Bridge2AI-Voice dataset explores the power of voice as a non-invasive, scalable biomarker linked to a wide range of health conditions—including neurological, mood, respiratory, and voice disorders. Designed to support responsible AI research, this ethically sourced dataset combines voice-derived features with detailed clinical and demographic data.

Version 2.0 includes:

  • 19,271 recordings from 442 participants across five North American sites
  • Derived voice features, including spectrograms (original recordings excluded for privacy)
  • Rich clinical data, demographics, and validated questionnaire responses

Participants were selected for conditions known to affect vocal characteristics, enabling researchers to explore meaningful links between acoustic markers and health status. This dataset is ideal for advancing AI models in diagnostics, monitoring, and digital health.

Salutogenesis / AI-READI

Flagship Dataset of Type 2 Diabetes from the AI-READI Project

Version 2.0.0 Release

About the Dataset

The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) project aims to revolutionize how we understand and treat type 2 diabetes mellitus (T2DM) through ethically sourced, AI-optimized data. By assembling one of the most comprehensive multimodal datasets of its kind, AI-READI supports cutting-edge research into disease progression, recovery, and health-promoting (salutogenic) pathways.

Version 2.0.0 includes data from 1,067 participants, collected between July 19, 2023 and July 31, 2024. This initial release is part of a larger effort to build a cross-sectional dataset of 4,000 individuals with longitudinal follow-up planned for 10% of the cohort. The study population is balanced across diabetes stages, from healthy individuals to those with insulin-dependent T2DM.

Collected data spans multiple biological, physiological, and behavioral modalities and is designed to support pseudo-time manifold analysis, enabling researchers to reconstruct disease trajectories and identify opportunities for intervention.

Key features:

  • 1,067 participants
  • 165,051 files (~2.01 TB total)
  • Multimodal, de-identified data (PHI removed)
  • No information on sex, race/ethnicity, or medications included in this release

This dataset is built to advance AI/ML research while minimizing bias and enhancing reproducibility. Future versions will continue to expand coverage, diversity, and depth.

More Data

Data Generation Projects

Browse Tools

Best Practices