Title: Compiler and Machine Learning-based Predictive Techniques for Security Enhancement through Software Debloating
Date: Monday, July 24, 2023
Time: 3:00pm - 5:00pm
Location: Klaus 2347, zoom
Chris Porter
Ph.D. Candidate in Computer Science
School of Computer Science
Georgia Institute of Technology
Committee
Dr. Santosh Pande (advisor)
Professor
Associate Chair for Graduate Studies
School of Computer Science
Georgia Institute of Technology
Dr. Rajiv Gupta
Distinguished Professor
Amrik Singh Poonian Professor of Computer Science
Associate Dean for Academic Personnel, BCOE
Department of Computer Science and Engineering
University of California, Riverside
Dr. Alessandro Orso
Professor
Associate Dean, College of Computing
School of Computer Science
Georgia Institute of Technology
Dr. Vivek Sarkar
Professor
Stephen Fleming Chair for Telecommunications
Chair, School of Computer Science
School of Computer Science
Georgia Institute of Technology
Dr. Qirun Zhang
Assistant Professor
School of Computer Science
Georgia Institute of Technology
Abstract
Code reuse attacks continue to be a serious threat to software. Attackers today are able to piece together short sequences of instructions in otherwise benign code to carry out malicious actions. Eliminating these reusable code snippets, known as gadgets, has become one of the prime focuses of attack surface reduction research. The aim is to break these chains of gadgets, thereby making such code reuse attacks impossible or substantially less common. Recent work on attack surface reduction has attempted to eliminate these attacks by subsetting the application, e.g. via user-specified inputs, configurations, or features, to achieve high gadget reductions. However, such approaches suffer from the limitations of soundness (meaning the software might crash or produce incorrect output during no-attack executions on regular inputs), or the techniques may be conservative and leave a large amount of attack surface untackled. This thesis develops three techniques that combine static analysis with dynamic predictions based on machine learning (ML) to address the above shortcomings. They are fully sound, obtain strong gadget reduction, and are shown to break shell-spawning gadget chains and stop real-world attacks arising out of known Common Vulnerabilities and Exposures (CVEs). The techniques reduce attack surface by activating a (minimal) set of functions at chosen callsites and then deactivating them upon return.
In the first work, BlankIt, we target library code and achieve ~97% attack surface reduction. The technique uses arguments to library function calls and their static single assignment-based backward slices for training an ML model, which then predicts reachable functions at the callsite using runtime values. In particular, we are able to debloat GNU libc, which is notorious for housing gadgets for code reuse attacks. In the second work, Decker, we target application code and achieve ~73% total gadget reduction. The percentage reduction is similar to prior art but without sacrificing soundness. Decker works by instrumenting the program at compile-time at key points to enable and disable code pages; then at runtime, the framework executes these permission-mapping calls with minimal overhead (~5%). In the third work, PDG, we show how to augment the whole-application technique with an accurate predictor to further reduce the potential attack surface. ML-based predictive techniques do not offer guarantees and suffer from mispredictions; thus, the predictions are sanitized with lightweight checks. The checks rely on statically derived ensue relations (i.e. valid call sequence relations) that are used for separating mispredictions from actual attacks. PDG achieves ~83% total gadget reduction with ~11% runtime overhead. Its predictions trigger runtime checking in ~4% of cases.
In conclusion, the thesis empirically shows that it is possible to devise precise and sound attack surface reduction techniques by combining static analysis and ML to overcome their inherent limitations. ML prediction aids purely static analysis by improving its precision, and static techniques augment the ML models by providing mechanisms for identifying when a misprediction is truly an attack.