Thesis Title: Bridging Discrete and Continuous Methods for Faster Optimization and Machine Learning
Thesis Committee:
Dr. Swati Gupta (advisor), Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Robert Freund, Sloan School of Management, Massachusetts Institute of Technology
Dr. Mohit Singh, Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Guanghui (George) Lan, Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Sebastian Pokutta, Institute of Mathematics, Department for AI, Technische Universität Berlin and Zuse Institute Berlin
Date and Time: Tuesday, April 11th, 2 pm (EST)
In-Person Location: Groseclose 402
Meeting Link: https://gatech.zoom.us/j/91072444735
Abstract:
With the machine learning and automation age we are in, it has become commonplace for algorithms to make decisions that impact our lives. Many such algorithms use optimization solvers as a subroutine, and there has been a revived interest in using robust approaches (like first-order optimization methods) that require low memory storage and iteration costs. In a wide range of applications, the optimization problem an algorithm is trying to solve contains inherent combinatorial structure that one can exploit to obtain novel algorithms. Further, combinatorial optimizers have developed elegant characterizations of properties of convex minimizers over combinatorial polytopes; however, this theory has not been adequately integrated within the iterative optimization framework, leaving a significant opportunity to speed-up continuous optimization algorithms used in machine and online learning. Although the theory of combinatorial and continuous optimization methods has evolved independently over the last many years for the most part, it is now crucial to bridge that theory with the aim of developing algorithms that can handle large-scale data-driven problems.
In Chapter 2, we focus on improving the rates of conditional gradient variants (in the presence of combinatorial structures) while maintaining their efficiency. We develop a novel theoretical framework that provides a unifying view of various descent directions in the literature and demystifies the impact of the movement in these directions towards attaining constrained minimizers with the aim of obtaining better convergence rates. Through our framework, we prove that Frank-Wolfe (FW) steps are greedy and correspond to infinite stepsizes along the negative gradient followed by a projection, thereby drawing a novel connection between projection-free and projected gradient (PG) algorithms. We use our insights to develop a novel algorithm SHADOW-CG that combines FW steps (i.e., greedily wrap around the polytope) and shadow steps (i.e., optimal local descent direction) and prove that it enjoys linear convergence. The convergence rate depends on the combinatorial structure of the face-lattice of the polytope, which is invariant to the geometry of the problem. We show that for simple polytopes with combinatorial structure (e.g., the hypercube and the simplex), our algorithms are the best feasible descent methods in terms of convergence rates and iteration complexity.
In Chapter 3, we next focus on speeding up iterative projections in mirror descent variants in the presence of combinatorial polytopes while maintaining their optimal rates. To capture a wide range of combinatorial decision sets encountered in practice, we consider submodular polytopes. We develop a toolkit to speed up the computation of iterative projections over submodular polytopes using both discrete and continuous perspectives. We subsequently adapt the away-step Frank-Wolfe algorithm to use this information and enable early termination. For the special case of cardinality-based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of $\Omega(n/ \log(n))$ . Our theoretical results show orders of magnitude reduction in runtime in preliminary computational experiments
In Chapter 4, we consider the case when the decision set in our optimization problem is given by vertices of a combinatorial (flow) polytope through the lens of the network reconfiguration problem. The network reconfiguration problem seeks to find a rooted tree T such that the energy of the (unique) feasible electrical flow over T is minimized. We prove the first approximation factor for this problem since it emerged in 1989. We propose a randomized iterative rounding algorithm that rounds a relaxed point in the relative interior of the flow polytope to a vertex. We prove that the algorithm achieves an $O(m - n)$ approximation and show that this bound is optimal for planar graphs. In addition, we provide novel lower bounds and corresponding approximation factors for various settings ranging from $O( \sqrt{n})$ over grids with uniform resistances on edges and $O(1)$ for grids with uniform edge resistances and demands. In our computational experiments, our algorithms take 1.35 seconds, whereas the mean time it takes the current heuristic used in practice to attain the same performance is around 10 hours. This improvement will enable operators to reconfigure distribution networks more frequently in practice.
Finally, in Chapter 5, we consider the case when our decision set is given by a mixed-integer program (MIP). There has been a recent emphasis on using machine learning (ML) models to automate clinical decision-making; however, all these works look at Electronic Medical Record (EMR) data without incorporating context or clinical expertise to detect erroneous or untrustworthy data. We are the first to translate clinical domain knowledge into high-dimensional mathematical constraints and project EMR data of ICU patients onto those clinical domain constraints. These projections can identify and correct erroneous clinical data. Computing projections can also help quantify how much a sick patient’s laboratory values and vitals have deviated from the normal range, thereby obtaining trust scores that improve the performance of ML classification models in clinical settings. We design a machine learning pipeline incorporating the proposed projections methodology to predict sepsis 6 hours before its onset, increasing the precision and AUC by approximately 1.5 compared to ML models trained without using the projections. Our algorithm also outperforms the state-of-the-art ML algorithms for the early detection of sepsis.