Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp0100000330z
Title: | Identifying GPT: First Principles for Generative AI Detection |
Authors: | Tian, Edward |
Advisors: | Narasimhan, Karthik |
Department: | Computer Science |
Class Year: | 2023 |
Abstract: | The safeguards for new technologies to be adopted responsibly need to be released immediately. In response to the mass adoption of generative AI technologies, this project outlines principles for detecting AI generations based on distributional differences in sentence-based perplexities of machine generations from human writing. We also develop and release a novel dataset of human and machine generated articles for analyzing these differences and demonstrate this dataset can be applied to train an effective low-cost AI detector. In addition to perplexity distributions, we introduce another distinction between human and machine writing based on variance in perplexities defined as ‘burstiness’ and posit that it is a quality innate to human writers that will remain a long-term indicator of human writing even with the continued evolution of generative LLMs. |
URI: | http://arks.princeton.edu/ark:/88435/dsp0100000330z |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
TIAN-EDWARD-THESIS.pdf | 600.85 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.