Genetic datasets

Population-scale databases of genetic information represent an important resource for research into genetic disorders, including PHTS. Below, we have indexed selected genetic datasets. Please note that most datasets are subject to specific access requirements, as detailed on their respective websites.

All of Us Research Program (NIH) - longitudinal US-based cohort aiming to capture the genetic diversity of the US population by collecting genetic and phenotypic data from 1 million individuals. The Controlled Tier Dataset v7 (n=245,394) has been reported to contain 28 individuals with a P/LP variant based on the pathogenicity classifications in ClinVar as of Feb 2025 (White et al., 2025).
The Genome Aggregation Database (gnomAD) - the largest publicly available catalogue of human genetic variation, aggregating and harmonizing exome and genome sequencing data from worldwide, large‐scale projects.
UK Biobank - the world's largest whole-genome dataset, containing comprehensive, longitudinal genetic and health information on approximately 500,000 individuals aged 40-69 from across the United Kingdom. The exome sequencing data released in July 2023 (n=469,589) has been reported to contain 36 individuals with a P/LP variant based on the pathogenicity classifications in ClinVar as of Feb 2025 (White et al., 2025).

Additionally, many other datasets have been aggregated by the BRaVa consortium, a collaboration aiming to bring together analysts from global biobanks and cohorts to aggregate and analyse rare (coding) variant associations in whole exome/genome sequencing data.

PHTS research tools

PHTS research tools Patient registries and cohorts Clinical outcome measures Interventional clinical trials and case series Case studies Animal models Other pre-clinical tools

Genetic datasets