Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01ns064933x
Title: Learning Language through Interactions with the Digital World
Authors: Yang, John Boda
Advisors: NarasimhanChen, KarthikDanqi
Department: Computer Science
Class Year: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: A noteworthy omission in the development process of common NLP models is the lack of interactive components. While common downstream applications of large language models increasingly involve interacting with humans or other agents in a shared environment, there remains a gap in infrastructures and approaches for incorporating interactive machine learning components into the training or inference paradigms of existing large language models (henceforth referred to as LLMs). The emergence of reasoning and decision-making capabilities as interesting and desirable behaviors in such LLMs presents a number of opportunities for designing more benchmarks and methodologies in the realm of interactive natural language processing.In this thesis, I discuss WebShop, a benchmark for interactive natural language processing. – a simulated e-commerce website environment that presents several challenges for language grounding including understanding compositional instructions, query (re-)formulation, comprehending and acting on noisy text in webpages, and performing strategic exploration. Given a text instruction specifying a product requirement, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase an item. WebShop includes a collection of over 1, 600 human demonstrations for the task, and training plus evaluation of a diverse range of agents are performed using reinforcement learning, imitation learning, and pre-trained image and language models. The best model achieves a task success rate of 29%, which outperforms rule-based heuristics (9.6%) but is far lower than human expert performance (59%). Analysis of agent and human trajectories along with ablations of various model components provide insights for developing future agents with stronger language understanding and decision-making abilities. Finally, agents trained on WebShop exhibit non-trivial sim-to-real transfer when evaluated on amazon.com and ebay.com, indicating the potential value of WebShop towards developing practical web-based agents that can operate in the wild.
URI: http://arks.princeton.edu/ark:/88435/dsp01ns064933x
Language: en
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Yang_princeton_0181G_14465.pdf7.52 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.