Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01x059cb70v
Title: | Lost in the Logic: An Evaluation of Large Language Models’ Reasoning Capabilities on LSAT Logic Games |
Authors: | Malik, Saumya |
Advisors: | Chen, Danqi |
Department: | Computer Science |
Class Year: | 2024 |
Abstract: | In this thesis, I evaluate the performance of Large Language Models (LLMs) on the Law School Admissions Test (LSAT), specifically the Logic Games section of the test. I focus on this section because it presents a complex logical reasoning task and thus is a valuable source of data for evaluating how modern, increasingly capable LLMs can handle hard logical reasoning tasks. I construct a dataset of LSAT logic games and their associated metadata, and extensively evaluate LLMs performance in a Chain-of-Thought prompting setting. Given the weak performance in this setting, I explore other prompting frameworks on a smaller subset of the dataset, adapting ideas from Reflexion to this task. This results in a substantially improved accuracy of 70% for GPT-4 and 46% for GPT-3.5 on this data subset, highlighting the capacity of LLMs to revise their logical errors, despite initially weak performance. Finally, I analyze the types of logic games that models perform better or worse on, as well as the types of logical errors I observe from human annotation, providing detailed insights on the logical reasoning capabilities of LLMs. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01x059cb70v |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
MALIK-SAUMYA-THESIS.pdf | 1.51 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.