Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01q811kp00q
Title: | Learning Human-Like Representations to Enable Learning Human Values |
Authors: | Wynn, Andrea Hui |
Advisors: | Griffiths, Thomas L |
Department: | Computer Science |
Class Year: | 2024 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We argue that representational alignment between humans and AI agents facilitates learning human values quickly and safely, an important step towards value alignment in AI. Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance. We propose that this kind of representational alignment between machine learning (ML) agents and humans can also support safely learning and exploring human values. We focus on ten different aspects of human values -- including ethics, honesty, and fairness -- and train ML agents using a variety of methods in a multi-armed bandit setting, where rewards reflect human value judgments over the chosen action. We use a synthetic experiment to demonstrate that agents which have high representational alignment with the environment exhibit safer and more efficient learning behavior. We then repeat this procedure in a realistic setting, using textual action descriptions and similarity judgments collected from humans and a variety of language models, to show that representational alignment enables both safe exploration and improved generalization when grounded in a real-world context. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01q811kp00q |
Type of Material: | Academic dissertations (M.S.E.) |
Language: | en |
Appears in Collections: | Computer Science, 2023 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Wynn_princeton_0181G_15048.pdf | 1.17 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.