Title: Leveraging Structured Knowledge to Enable Efficient AI Assistants

 

Date: Thursday, November 21st, 2024

Time: 11:00 AM - 1:00 PM EST

Location: Zoom Link

Meeting ID: 920 5209 9288

Passcode: 779339

 

Karan Samel

Machine Learning PhD Student

School of Interactive Computing

Georgia Institute of Technology

 

Committee

Dr. Irfan Essa (Advisor): School of Interactive Computing, Georgia Tech; Google DeepMind

Dr. Thomas Ploetz: School of Interactive Computing, Georgia Tech

Dr. Alan Ritter: School of Interactive Computing, Georgia Tech

Dr. Wei Xu: School of Interactive Computing, Georgia Tech

Dr. Cheng Li: Google DeepMind

 

Abstract

Virtual AI assistants help with tasks from different domains, such as personalized e-commerce recommendations or cooking instructions based on a user’s prior preferences. To build these systems efficiently, we require domain data from those tasks, which may be limited in quantity. This poses a challenge for traditional deep learning models, which are data intensive and require domain specific data to generalize to out of distribution cases. We address this by integrating prior structured knowledge, instead of obtaining more unstructured domain data, into models to improve data and computational efficiency. 

 

First, within e-commerce, we investigate leveraging a knowledge graph of relations between products and user categories of interest. These knowledge relations were integrated into a BERT language model, which learns which knowledge is the most relevant for the prediction task at hand. This Knowledge Relevance BERT (KR-BERT) model enables the identification of relevant user topics of interest when limited prior product data is available. 

 

Second, these e-commerce platforms also arrange products within taxonomies or hierarchies. These hierarchies can be used to learn better representations of products based on other products in nearby hierarchy categories. This is done by learning a language model that captures both the product’s text semantics as well as its hierarchy path structure within a Transformer model. This Semantic Structural Transformer (SST) approach improves on an array of downstream e-commerce tasks while using the same amount of prior data.  

 

Last, we aid users in instructional tasks by augmenting models with step-by-step procedures. This is done by mining “How To” instructional steps from websites to generate this list of procedural steps for different tasks. Heuristics are used to label unannotated instructional video clips with the corresponding steps. Providing these externally mined sequential steps leads to improved performance for predicting and forecasting different procedural steps within an instructional video, which then can be shown to assist a user. The hierarchy path of each video is extracted and assists when predicting the overall task occurring in that video, filtering which videos to identify the relevant steps from. This Procedural Hierarchical Video Transformer (PHiViT, or Pivot) leads to data and computation efficient training for downstream step and video prediction tasks.

 

Across different domain tasks, we show how to obtain and integrate structured sources of knowledge within traditional deep learning models. Doing so reduces the domain specific data and compute requirements that limit the practicality of deploying these AI assistants.