Thesis Title: Design and Analysis of Stochastic Processing and Matching Networks

 

Thesis Committee:

Dr. Siva Theja Maguluri, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Sigrun Andradottir, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Debankur Mukherjee, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Lei Ying, Electrical Engineering and Computer Science, University of Michigan 

Dr. Itai Gurvich, Kellogg School of Management, Northwestern University

Dr. Martin Zubeldia, School of Industrial and Systems Engineering, University of Minnesota


University of Minnesota

 

Date and Time: Friday, August 11th, 2023, 10:00 AM - 12:00 PM EST

Meeting Link: https://gatech.zoom.us/j/94809450486

 

 

Abstract:

Stochastic Processing Networks (SPNs) and Stochastic Matching Networks (SMNs) play a crucial role in various engineering domains, encompassing applications in Data Centers, Telecommunication, Transportation, and more. As these networks become increasingly complex and integral to modern systems, designing efficient decision-making policies while obtaining strong performance guarantees on throughput and delay has become a pressing research area. This thesis addresses the multifaceted challenges prevalent in today's stochastic networks and investigates their impact on system performance. Major design considerations are thoroughly examined, including scalability, customer abandonment, multiple bottlenecks, and adherence to Service Level Agreements (SLAs). Each of these factors heavily influences the system delay and queue length.

 

In Chapter 2, we focus on establishing bounds for the tail probabilities of queue lengths in queueing systems. The results help provide strict SLA guarantees for large-scale systems. As obtaining exact steady-state distributions is often infeasible, the study provides exponentially decaying bounds in Many-Server Heavy-Traffic regimes, where the load on the system approaches the capacity simultaneously as the system size grows large. Unlike other approaches, the derived bounds are not limited to asymptotic cases and remain applicable even for finite values of load and system size. The method uses an exponential Lyapunov function to bound the Moment-Generating Function (MGF) of queue lengths, and the application of Markov's inequality contributes to the derivation of the tail bounds. To demonstrate our methodology, we primarily use a load balancing system operating under the Join-the-Shortest Queue policy (JSQ), and we obtain tail bounds applicable in non-asymptotic large-scale regimes as well as non-asymptotic Large Deviations regimes. 

 

In Chapter 3, we again look at a Load Balancing system operating under the Join-the-Shortest Queue policy (JSQ), but with an additional aspect of customer abandonments. In particular, we characterize the `distribution of appropriately centered and scaled steady-state queue length' (or limiting distribution) as the abandonment rate becomes very small. Our work encompasses the case when the system sees heavy traffic as well as the case when the system is overloaded. As the system load increases, we observe that the limiting distribution undergoes a phase transition from exponential to a truncated-normal and finally to a normal distribution. The chapter employs the Transform method to establish results about the limiting Moment Generating Function (MGF) of queue lengths. 

 

Afterward, in Chapter 4, we focus our study on understanding the performance of SPNs with multiple bottlenecks, for which the problem becomes significantly more challenging. For this, we use the Input-Queued Switch (IQ-Switch) model, which models a data center network and serves as a representative of SPNs with multiple bottlenecks. Prior literature has established that the well-studied MaxWeight policy provides superior throughput and mean queue length performance. Even though the MaxWeight algorithm results in small queue lengths, the complexity of implementing it is high, which is practically undesirable. We show that several classes of low time-complexity algorithms have similar mean queue lengths to MaxWeight when the system load is very high.

 

Moving ahead, in Chapter 5, we aim to go beyond the mean queue length and provide strict SLA or tail guarantees for an SPN with multiple bottlenecks. We tackle this problem by studying the steady-state queue length distribution. For the case of IQ-Switch, finding `the complete joint distribution of queue length vector in heavy traffic' (or limiting joint distribution) was posed as an open problem in prior literature. Our work solves the open problem for IQ-switch (under a particular conjecture) operating under the MaxWeight scheduling algorithm and other low-complexity algorithms considered in Chapter 4. For IQ-Switch, under uniform traffic and heavy load condition, we provide the limiting distribution in terms of a non-linear combination of independent and exponentially distributed random variables. We do this by establishing a functional equation on the Laplace transform of the limiting joint distribution using the Transform method, which can be solved to obtain the result.

 

Finally, in Chapter 6, we study the queueing dynamics of an SMN using the exciting example of a quantum network. This system is much harder to analyze than an SPN, as the effective service rate depends on the system state. We aimed to provide performance guarantees on the queue length like in previous chapters. However, we soon realized that even the fundamental problem of finding the stability conditions for an SMN is not entirely answered. Thus, in this chapter, we characterize the stability conditions for a class of quantum networks under the MaxWeight policy. Interestingly, we find that the stability region of the quantum network is defined as the convex hull of the achievable throughput of suitably designed sub-networks.