Title: Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change
Date: Monday, January 30, 2023
Time: 1pm - 3pm EST
Location: Coda C1115 Druid Hills
Virtual Link: Microsoft Teams
Virtual Meeting ID: 253 542 548 123
Passcode: H9RZyU
Jonathan Balloch
Robotics Ph.D. Student
School of Interactive Computing
Georgia Institute of Technology
Committee
Dr. Mark Riedl (Advisor), School of Interactive Computing, Georgia Tech
Dr. Sehoon Ha, School of Interactive Computing, Georgia Tech
Dr. Seth Hutchinson, School of Interactive Computing, Georgia Tech
Dr. Michael Littman, Computer Science Department, Brown University
Dr. Harish Ravichandar, School of Interactive Computing, Georgia Tech
Abstract
Techniques for learning policies that can solve sequential decision-making problems have a wide applicability in the real world, from conversational AI to disaster response robots. However, applying learning techniques can be difficult because the open world is vast and varying. Like many techniques for solving sequential decision-making problems, most reinforcement learning methods assume that the world is a closed, fixed process. This exemplifies the problem of online task transfer in reinforcement learning (RL), or the process of adapting a policy online to a shift in an agent's environment. Solutions to online task transfer are important and necessary for agents to operate in the presence of open world novelties–events that regularly transform real world environments.
This thesis aims to shed light on and advance reinforcement learning solutions to the online task transfer problem through reuse of prior knowledge and directed exploration solutions. First, I define this problem in the context of conventional reinforcement learning and present the NovGrid environment I developed for use in studying online task transfer. Second, I contribute the WorldCloner method, which uses neurosymbolic RL for efficient novelty adaptation and demonstrates how careful reuse of prior knowledge can improve the efficiency transfer. Third, I present the entropy-based exploration method for the RL world model, MaxEnt World Explorer, and use this to demonstrate how transfer-focused directed exploration can improve transfer efficiency without sacrificing source-task performance. Lastly, I propose two additional contributions to continue developing reuse of prior knowledge and directed exploration solutions. Specifically, to further develop exploration for transfer, I will develop a time-dependent uncertainty-based exploration strategy that will be more sensitive stale environment information and as a result changes in the environment. To further develop reuse of prior knowledge, I propose a discrete latent representation of environment composition and dynamics that is grounded in natural language to avoid unnecessary changes to learned latent representations during adaptation.