Title: TOWARDS FINE-GRAINED MULTI-ATTRIBUTE CONTROL USING LANGUAGE MODELS
 

Date: Friday, 21st July, 2023
 

Time:  11:30 AM to 1:30 PM ET     (8:30 AM - 10:30 AM PT)

Location: Virtual | Zoom link - https://gatech.zoom.us/j/93320367440?pwd=REhiamxVREcwdUF5Z21XTXJ1NmFWUT09&from=addon

Ashutosh Baheti

PhD student

School of Interactive Computing
Georgia Institute of Technology

Committee:

Prof. Mark Riedl (Advisor) -- School of Interactive Computing, Georgia Institute of Technology

Prof. Alan Ritter (Co-Advisor) -- School of Interactive Computing, Georgia Institute of Technology

Prof. Dhruv Batra -- School of Interactive Computing, Georgia Institute of Technology

Prof. Munmun de Choudhury -- School of Interactive Computing, Georgia Institute of Technology

Prof. Maarten Sap -- Language Technologies Institute, Carnegie Mellon University

 

Abstract

Recent advancements in pretraining large language models have resulted in their remarkable ability to generate complex and human-proficient language. Consequently, these models have gained widespread adoption as complex problem-solving chatbots and writing assistants. However, as we increasingly rely on these powerful language models, ensuring their safe and effective operation necessitates extensive research in controllable text generation. Existing methods manipulate the decoding process, use data augmentation or online reinforcement learning methods to encourage models to generate responses with the desired attributes. However, even the state-of-the-art language models struggle to generate the most accurate or desired output at the first attempt. Inspired by recent developments in self-correction in large language models and new reinforcement learning methods, we aim to train smaller language models as fine-grained editors, whereby they iteratively edit outputs to satisfy threshold constraints over multiple classifier-based attributes.

 

In this thesis, I show preliminary work to incorporate per-token distributional constraints during decoding and improve the generation quality of traditional LSTM-based dialog models. Later, I show a study of contextual offensive behavior of pretrained large language models and curate a high-quality dataset for toxicity detection. We also experiment with preliminary controlled text generation methods to decrease the dialog model's toxicity and agreement in offensive contexts. Next, I introduce a novel offline RL algorithm that can utilize arbitrary numeric scores as rewards during training to optimize any user-desired LM behavior. Building on this offline RL framework, I propose a fine-grained multi-attribute controllability task, where the goal is to guide the language model to generate output sequences that satisfy user-defined threshold-based attribute constraints. We frame the problem as an editing game, where the language model can take multiple edits to reach the desired attributes. Interestingly, our method uses Offline RL to cheaply train LM editors without any exploration.