Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01b2774008t
Title: | Augmenting Concept-Labeled Datasets with Counterfactuals for Interpretability |
Authors: | Sinha, Prachi |
Advisors: | Ramaswamy, Vikram |
Department: | Computer Science |
Class Year: | 2024 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Concept-based interpretability methods use a pre-defined set of human-understandable concepts to explain a model's output. These explanations are learned by ``probing" the model using a concept-labeled dataset and learning how concepts can be linearly combined to predict outputs. However, there are several limitations to current methods. Explanations are highly dependent on the probe dataset, and they may reflect correlations between concepts and outputs, rather than causation. To address these issues, we augment a probe dataset with counterfactual images in which particular semantic concepts have been removed. Using such counterfactual images, we attempt to improve the accuracy of explanations for a scene classification model, and more directly learn the causal relationships between concepts and outputs. We find that explanation accuracy improves in certain cases, but is generally still limited by the probe dataset’s alignment with the model, and that counterfactual images do highlight difference in concept importance in the presence and absence of other strongly co-occurring concepts. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01b2774008t |
Type of Material: | Academic dissertations (M.S.E.) |
Language: | en |
Appears in Collections: | Computer Science, 2023 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Sinha_princeton_0181G_15054.pdf | 821.39 kB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.