Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp015h73q0433
Title: | Language Agents: From Next-Token Prediction to Digital Automation |
Authors: | Yao, Shunyu |
Advisors: | Narasimhan, Karthik |
Contributors: | Computer Science Department |
Keywords: | Cognitive Science Digital Automation Language Agents Language Models Natural Language Processing Reinforcement Learning |
Subjects: | Artificial intelligence |
Issue Date: | 2024 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Building autonomous agents to interact with the world lies at the core of artificial intelligence (AI). This thesis introduces "language agents'', a new category of agents that utilize large language models (LLMs) to reason to act, marking a departure from traditional agents via extensive rule design or learning. It is developed in three parts: Part I motivates the necessity for language agents by introducing a new set of AI problems and benchmarks based on interaction with large-scale, real-world computer environments, such as the Internet or code interfaces. These "digital automation'' tasks present tremendous value for alleviating tedious labor and improving our lives, yet pose significant challenges for prior agent or LLM methods in decision-making over open-ended natural language and long horizon, calling for new methodologies. Part II lays the methodological foundation for language agents, where the key idea is to apply LLM reasoning for versatile and generalizable agent acting and planning, which also augments LLM reasoning to be more grounded and deliberate via external feedback and internal control. We show language agents can solve a diversity of language and agent tasks (especially digital automation tasks proposed in Part I), with notable improvements over prior LLM-based methods and traditional agents. Part III consolidates insights from Parts I and II and outlines a principled framework for language agents. The framework provides modular abstractions to organize various LLM-based methods as agents, to understand their gaps from human cognition, and to inspire and develop new methods towards general-purpose autonomous agents. From foundational empirical tasks and methods to a unifying conceptual framework, this thesis establishes the study of language agents as a distinct and rigorously defined field at the frontier of AI research. |
URI: | http://arks.princeton.edu/ark:/88435/dsp015h73q0433 |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Yao_princeton_0181D_15086.pdf | 14.61 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.