Title: How Machine Understands You and Me: Naturally Learn and Generalize in Task-oriented Dialogues

 

Date: 6/5/2023 (Mon)

Time: 1:30 PM-2:30 PM ET

Location (meeting link): https://gatech.zoom.us/j/3333257776?pwd=ODBYV05xM0pVY2MyNnZGYkN3bFhLdz09

 

Ting-Wei Wu

Machine Learning PhD Student

Department of Electrical and Computer Engineering

Georgia Institute of Technology

 

Committee

1 Dr. Biing-Hwang Juang (Advisor, School of Electrical and Computer Engineering, Georgia Tech)

2 Dr. David Anderson (School of Electrical and Computer Engineering, Georgia Tech)

3 Dr. Diyi Yang (Department of Computer Science, Stanford University)

4 Dr. Elliot Moore (School of Electrical and Computer Engineering, Georgia Tech)

5 Dr. Sungjin Lee (Alexa AI, Amazon)

 

Abstract

Modern human-computer interactions have recently gained a lot of attention for customer service automation applications, which alleviate the effort of intensive human labor. These systems usually involve a natural language understanding (NLU) process with sophisticated considerations to provide appropriate human-like responses. However, preliminary systems with pre-structured interactions face challenges due to limited data and the inability to capture high-level semantic nuances. Task-oriented modules trained on a limited single-domain dataset suffer from spurious correlations and lack the robustness to generalize to new low-frequency intents or critical domains without external knowledge or multi-turn dynamics. They may also produce incomplete and ill-formed responses when presented with new, unseen circumstances, such as tail domains or foreign languages. In this thesis, we propose several neural-based approaches to improve task-oriented dialog system naturalness, i.e. the accuracy and fluency to which extent the machine understands the users and generalizes over multiple scenarios. In tackling user intent and expression variability, we learn the idea of explicit intent embeddings, implicit intent clusters, and a listwise context-based reranking approach. Additionally, we integrate dialogue contexts and knowledge bases in large language models to model multi-turn dynamics. Finally, we propose a heterogeneous data augmentation scheme and a cross-lingual adapter-based framework to improve the robustness of large pre-trained models in skill routing and multilingual response generation problems. Overall, this thesis aims to learn more robust representations for better dialogue understanding and user experiences without much effort in designing manual rules in different domains.