Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01q811kp00q
Title: Learning Human-Like Representations to Enable Learning Human Values
Authors: Wynn, Andrea Hui
Advisors: Griffiths, Thomas L
Department: Computer Science
Class Year: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We argue that representational alignment between humans and AI agents facilitates learning human values quickly and safely, an important step towards value alignment in AI. Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance. We propose that this kind of representational alignment between machine learning (ML) agents and humans can also support safely learning and exploring human values. We focus on ten different aspects of human values -- including ethics, honesty, and fairness -- and train ML agents using a variety of methods in a multi-armed bandit setting, where rewards reflect human value judgments over the chosen action. We use a synthetic experiment to demonstrate that agents which have high representational alignment with the environment exhibit safer and more efficient learning behavior. We then repeat this procedure in a realistic setting, using textual action descriptions and similarity judgments collected from humans and a variety of language models, to show that representational alignment enables both safe exploration and improved generalization when grounded in a real-world context.
URI: http://arks.princeton.edu/ark:/88435/dsp01q811kp00q
Type of Material: Academic dissertations (M.S.E.)
Language: en
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Wynn_princeton_0181G_15048.pdf1.17 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.