Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b2774008t
Title: Augmenting Concept-Labeled Datasets with Counterfactuals for Interpretability
Authors: Sinha, Prachi
Advisors: Ramaswamy, Vikram
Department: Computer Science
Class Year: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: Concept-based interpretability methods use a pre-defined set of human-understandable concepts to explain a model's output. These explanations are learned by ``probing" the model using a concept-labeled dataset and learning how concepts can be linearly combined to predict outputs. However, there are several limitations to current methods. Explanations are highly dependent on the probe dataset, and they may reflect correlations between concepts and outputs, rather than causation. To address these issues, we augment a probe dataset with counterfactual images in which particular semantic concepts have been removed. Using such counterfactual images, we attempt to improve the accuracy of explanations for a scene classification model, and more directly learn the causal relationships between concepts and outputs. We find that explanation accuracy improves in certain cases, but is generally still limited by the probe dataset’s alignment with the model, and that counterfactual images do highlight difference in concept importance in the presence and absence of other strongly co-occurring concepts.
URI: http://arks.princeton.edu/ark:/88435/dsp01b2774008t
Type of Material: Academic dissertations (M.S.E.)
Language: en
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Sinha_princeton_0181G_15054.pdf821.39 kBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.