Identifying GPT: First Principles for Generative AI Detection

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp0100000330z

Title:	Identifying GPT: First Principles for Generative AI Detection
Authors:	Tian, Edward
Advisors:	Narasimhan, Karthik
Department:	Computer Science
Class Year:	2023
Abstract:	The safeguards for new technologies to be adopted responsibly need to be released immediately. In response to the mass adoption of generative AI technologies, this project outlines principles for detecting AI generations based on distributional differences in sentence-based perplexities of machine generations from human writing. We also develop and release a novel dataset of human and machine generated articles for analyzing these differences and demonstrate this dataset can be applied to train an effective low-cost AI detector. In addition to perplexity distributions, we introduce another distinction between human and machine writing based on variance in perplexities defined as ‘burstiness’ and posit that it is a quality innate to human writers that will remain a long-term indicator of human writing even with the continued evolution of generative LLMs.
URI:	http://arks.princeton.edu/ark:/88435/dsp0100000330z
Type of Material:	Princeton University Senior Theses
Language:	en
Appears in Collections:	Computer Science, 1987-2024

Files in This Item:

File	Size	Format
TIAN-EDWARD-THESIS.pdf	600.85 kB	Adobe PDF	Request a copy

Search

Browse