In partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Bioinformatics
in the School of Biological Sciences
Ujani Hazra
Defends her thesis:
Studying Polygenic Traits Across Populations: Evolutionary Insights and Predictive Models of Replication
April 18th, 2025
12pm EST
Engineered Biosystems Building (EBB),
CHOA seminar room EBB 1005
Thesis Advisor:
Dr. Joseph Lachance
School of Biological Sciences
Georgia Institute of Technology
Committee Members:
Dr. Gregory Gibson
School of Biological Sciences
Georgia Institute of Technology
Dr. I. King Jordan
School of Biological Sciences
Georgia Institute of Technology
Dr. Annalise Paaby
School of Biological Sciences
Georgia Institute of Technology
Dr. Kaixiong Ye
Department of Genetics
University of Georgia
Abstract:
This dissertation investigates how evolutionary history, population structure, and genomic features shape the architecture and cross-population replicability of polygenic traits. Although genome-wide association studies (GWAS) have identified thousands of trait-associated variants, the extent to which these findings generalize across ancestries remains limited. To understand the mechanisms underlying this variability, I analyze large-scale genomic datasets from sub-Saharan African populations, UK Biobank, and Biobank Japan using a combination of GWAS, evolutionary analysis, and interpretable machine learning.
Across a diverse set of complex traits including cancer, anthropometric, and metabolic phenotypes, results reveal widespread signatures of background selection but minimal evidence for recent positive selection or polygenic adaptation. Population-level differences in risk are more often driven by genetic drift, allele frequency shifts, and linkage disequilibrium differences, which together contribute to limited replication of individual variants and poor transferability of polygenic scores across cohorts. These effects are particularly pronounced in African populations, which harbor high genetic diversity and substantial regional heterogeneity in effect sizes.
To quantify the factors driving replication at the SNP level, I develop a predictive framework that models replication as a function of statistical, functional, and evolutionary annotations. Using matched GWAS from UK Biobank and Biobank Japan, I demonstrate that replication is not stochastic but can be predicted with high accuracy. Key predictors include GWAS effect size, allele frequency symmetry, and LD in the replication population, while signals of selection contribute in more nuanced, trait-specific ways. This framework provides new insight into the determinants of replicability and offers a principled strategy for prioritizing variants with stable effects across populations.