Thesis Title: Exploiting Problem Structure for Faster Optimization: A Trilinear Saddle Point Approach

 

Thesis Committee:

 

Dr. Guanghui Lan (advisor), School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Siva Theja Maguluri, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Arkadi Nemirovski, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Alexander Shapiro, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Richard Tapia, Department of Computational Applied Mathematics and Operations Research, Rice University

 

Date and Time: Monday, July 20, 2023, 2:00PM - 3:00pm EST

Meeting Link: https://gatech.zoom.us/j/7236275378

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.

gatech.zoom.us

 

 

Abstract:

Optimization is vital in operations research, encompassing model fitting and decision-making. The exponential growth of data holds promise for realistic models and intelligent decision-making. However, the sheer volume and  the exceptional dimension of big data make computations prohibitively expensive and time-consuming. In this thesis, we propose a trilinear saddle point approach to tackle some challenges in big-data optimization. By effectively leveraging problem structure, our approach significantly improves computation complexities for a few important problem classes in stochastic programming and non-linear programming. This offers valuable insights into the intrinsic computational hardness.

 

In Chapter Two, we consider a distributionally robust two-stage stochastic optimization problem with discrete scenario support. While much research effort has been devoted to tractable reformulations for DRO problems, especially those with continuous scenario support, few efficient numerical algorithms are developed, and most of them can neither handle the nonsmooth second-stage cost function nor the large number of scenarios $K$ effectively. We fill the gap by reformulating the DRO problem as a trilinear min-max-max saddle point problem and developing novel algorithms that can achieve an $O(1/\epsilon)$ iteration complexity which only mildly depends on the scenario number. The major computations involved in each iteration of these algorithms can be conducted in parallel if necessary. Besides, for solving an important class of DRO problems with the Kantorovich ball ambiguity set, we propose a slight modification of our algorithms to avoid the expensive computation of the probability vector projection. Finally, preliminary numerical experiments are conducted to demonstrate the empirical advantages of the proposed algorithms.

 

In  Chapter Three, we study the convex nested stochastic composite optimization (NSCO) problem, which finds applications in reinforcement learning and risk-averse optimization. Existing NSCO algorithms exhibit significantly worse stochastic oracle complexities compared to those without nested structures, and they require all outer-layer functions to be smooth. To address these challenges, we propose a stochastic trilinear (multi-linear) saddle point formulation that enables the design of order-optimal algorithms for general convex NSCO problems. When all outer-layer functions are smooth, we propose a stochastic sequential dual (SSD) method to achieve an oracle complexity of $O(1/\epsilon^2)$ ($O(1/\epsilon)$) when the problem is non-strongly (strongly) convex.  In cases where there are structured non-smooth or general non-smooth outer-layer functions, we propose a nonsmooth stochastic sequential dual (nSSD) method, achieving an oracle complexity of $O(1/\epsilon^2)$. Notably, we prove that this $O(1/\epsilon^2)$ complexity is unimprovable even under a strongly convex setting. These results demonstrate that the convex NSCO problem shares similar oracle complexities as those without nested compositions, except for strongly convex and outer-non-smooth problems.

 

In Chapter Four, we investigate the communication complexity of convex risk-averse optimization over a network. The problem generalizes the well-studied risk-neutral finite-sum distributed optimization problem and its importance stems from the need to handle risk in an uncertain environment. For algorithms in the literature, there exists a gap in communication complexities for solving risk-averse and risk-neutral problems. To address this gap, we utilize a trilinear saddle point reformulation to design two distributed algorithms: the distributed risk-averse optimization (DRAO) method and the distributed risk-averse optimization with sliding (DRAO-S) method. The single-loop DRAO method involves solving potentially complex subproblems, while the more sophisticated DRAO-S method requires only simple computations.  We establish lower complexity bounds to show their communication complexities to be unimprovable, and conduct numerical experiments to illustrate the encouraging empirical performance of the DRAO-S method.

 

In Chapter Five, we utilize the trilinear saddle point approach to develop new complexity results for classic nonlinear function-constrained optimization. We introduce the single-loop Accelerated Constrained Gradient Descent (ACGD) method, which modifies Nesterov's celebrated Accelerated Gradient Descent (AGD) method by incorporating a linearly-constrained descent step. Lower complexity bounds are provided to establish the tightness of ACGD's complexity bound under a specific optimality regime. To enhance efficiency for large-scale problems, we propose the ACGD with Sliding (ACGD-S) method. ACGD-S replaces computationally demanding constrained descent steps with basic matrix-vector multiplications. ACGD-S shares the same oracle complexity as ACGD and achieves an unimprovable computation complexity measured by the number of matrix-vector multiplications. These advancements offer insights into complexity and provide efficient solutions for nonlinear function-constrained optimization, catering to both general and large-scale scenarios.