In partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Bioinformatics

in the School of Biological Sciences

 

Ujani Hazra

 

Defends her thesis:

Studying Polygenic Traits Across Populations: Evolutionary Insights and Predictive Models of Replication

 

April 18th, 2025

12pm EST

Engineered Biosystems Building (EBB), 

CHOA seminar room EBB 1005

 

Thesis Advisor:

Dr. Joseph Lachance

School of Biological Sciences

Georgia Institute of Technology

 

Committee Members:

Dr. Gregory Gibson 

School of Biological Sciences

Georgia Institute of Technology 

 

Dr. I. King Jordan

School of Biological Sciences

Georgia Institute of Technology

 

Dr. Annalise Paaby

School of Biological Sciences

Georgia Institute of Technology

 

Dr. Kaixiong Ye

Department of Genetics

University of Georgia

 

Abstract:

This dissertation investigates how evolutionary history, population structure, and genomic features shape the architecture and cross-population replicability of polygenic traits. Although genome-wide association studies (GWAS) have identified thousands of trait-associated variants, the extent to which these findings generalize across ancestries remains limited. To understand the mechanisms underlying this variability, I analyze large-scale genomic datasets from sub-Saharan African populations, UK Biobank, and Biobank Japan using a combination of GWAS, evolutionary analysis, and interpretable machine learning.

Across a diverse set of complex traits including cancer, anthropometric, and metabolic phenotypes, results reveal widespread signatures of background selection but minimal evidence for recent positive selection or polygenic adaptation. Population-level differences in risk are more often driven by genetic drift, allele frequency shifts, and linkage disequilibrium differences, which together contribute to limited replication of individual variants and poor transferability of polygenic scores across cohorts. These effects are particularly pronounced in African populations, which harbor high genetic diversity and substantial regional heterogeneity in effect sizes.

To quantify the factors driving replication at the SNP level, I develop a predictive framework that models replication as a function of statistical, functional, and evolutionary annotations. Using matched GWAS from UK Biobank and Biobank Japan, I demonstrate that replication is not stochastic but can be predicted with high accuracy. Key predictors include GWAS effect size, allele frequency symmetry, and LD in the replication population, while signals of selection contribute in more nuanced, trait-specific ways. This framework provides new insight into the determinants of replicability and offers a principled strategy for prioritizing variants with stable effects across populations.