In partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Bioinformatics

in the School of Biological Sciences

 

Matthew Hunter Seabolt

 

Defends his thesis:

Expanding the Bioinformatics Toolbox for DIVERSITY and TAXONOMIC STUDIES OF Microbial Eukaryotic Pathogens

 

Wednesday July 26, 2023

3:00-4:00pm

Krone Engineered Biosystems Building, Children’s Healthcare of Atlanta Seminar Room (EBB 1005)

Join Zoom Meeting

 

Thesis Advisor:

Dr. Kostas Konstantinidis, Advisor

School of Civil & Environmental Engineering

Georgia Institute of Technology

 

Committee Members:

Dr. Joel Kostka

School of Biological Sciences

Georgia Institute of Technology

 

Dr. I. King Jordan

School of Biological Sciences

Georgia Institute of Technology

 

Dr. Christine Heitsch

School of Mathematics

Georgia Institute of Technology

 

Dr. Dawn M. Roellig

Waterborne Disease Prevention Branch

Division of Foodborne, Waterborne, and Environmental Diseases

Centers for Disease Control and Prevention

 

Abstract:
Cataloguing and studying microbial eukaryote diversity and speciation present unique challenges due to their evolutionary divergence from well-studied model genomes, limited culturing methods, and uncertain taxonomy. The scarcity of high-quality genomic data poses a significant obstacle to understanding genome relatedness and important traits like virulence and antimicrobial resistance, with much debate centered on how to reconcile discordant phylogenetic signals from existing molecular typing data with historical records and type specimens.  Thus far, no major movement has occurred in almost two decades.  Additionally, existing bioinformatics methods need advancement to handle large eukaryotic genomes effectively.

This research aims to expand the set of available bioinformatics tools for the comparative analysis of genomes of microbial eukaryotes. Case studies using the protozoan parasite Giardia duodenalis as a model organism are presented. These studies include (i) developing a new, automated pipeline for identifying the best gene markers in the genome for phylogenetic reconstruction purposes and strain-level resolution, (ii) the creation of a statistical framework to identify cryptic species and quantify their evolutionary relationships, and (iii) improving reference genome annotation of the Giardia genome.  Lastly, we employed the genome aggregate average nucleotide identity (ANI) and graph-based methods to assess whether or not natural boundaries between eukaryotic species exist, similar to those previously observed for Prokaryotes, and study the relationship between shared gene content and ANI (or degree of genetic relatedness).


The findings suggest that sequence-discrete clusters of genomes, akin to traditional species, are prevalent among the examined genomes and our methodology is robust across eukaryote phyla and at multiple taxonomic hierarchies. Applying the conclusions from this research, such as 95% ANI as a general-purpose species boundary in eukaryotes as well as ANI’s utility for molecular typing, this research’s conclusions contribute novel biological insights and bioinformatics methods to the toolkit for eukaryote taxonomy, and genome analysis.