Investigating Persuasiveness in Large Language Models

Ekpo, Promise Osaine

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp017m01bp99j

Title:	Investigating Persuasiveness in Large Language Models
Authors:	Ekpo, Promise Osaine
Advisors:	Fisac, Jaime F
Department:	Computer Science
Class Year:	2023
Publisher:	Princeton, NJ : Princeton University
Abstract:	While the rate of progress and innovation in Artificial intelligence (AI) has many potential benefits, the potential for accidental deleterious effects cannot be overemphasised. It has been empirically demonstrated that large language models (LLMs) can learn to perform a wide range of natural language processing (NLP) tasks in a self-supervised setting. However, these models might unintentionally produce convincing arguments for false statements. There has been recent interest in improving LLM performance by fine-tuning in a reinforcement learning framework through interaction with human users. One could raise the concern that even seemingly benign reward functions can lead to strategic manipulation of user responses as an instrumental goal to achieve higher overall performance. This thesis seeks to investigate this possibility by evaluating the persuasiveness of self-supervised-only and reinforcement-learning-fine-tuned LLMs. In this work, we will discuss three approaches to investigating the degree of persuasiveness in LLMs by searching for qualitative failures through a direct query, quantifying the persuasiveness of generated outputs, and training on this persuasiveness metric as a reward signal with reinforcement learning. Through our investigation, we find that state-of-the-art LLMs fail when prompted with statements about less popular misconceptions or domain-specific myths. With this investigation of the safety-critical related failures of LLMs, we hope to further inform the public of the degree of reliability of these models and guide their use.
URI:	http://arks.princeton.edu/ark:/88435/dsp017m01bp99j
Type of Material:	Academic dissertations (M.S.E.)
Language:	en
Appears in Collections:	Computer Science, 2023

Files in This Item:

File	Description	Size	Format
Ekpo_princeton_0181G_14593.pdf		405.88 kB	Adobe PDF	View/Download

Show full item record

Search

Browse