Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01gq67jv370
Title: | Token Reduction Module for Efficient Vision Transformers |
Authors: | Zhang, Ryan |
Advisors: | Russakovsky, Olga |
Department: | Computer Science |
Certificate Program: | Center for Statistics and Machine Learning |
Class Year: | 2022 |
Abstract: | Model efficiency is a central concern for the practical deployment of computer vision models in real-world scenarios, and the recently popularized vision transformer presents a unique opportunity to approach this problem in a different manner. While convolutional neural networks (CNNs)–which have been the primary tool in computer vision for almost a decade–utilize operations that make it difficult to operate on non-standard shaped images, vision transformers are more robust in this setting. In the image classification task, they function by splitting an image into patches that are then processed separately by the model, while utilizing the attention mechanism to transfer information between them. Since some patches may cover non-important portions of the image, one easy way to reduce the amount of processing needed is by only keeping those patches that are necessary for making a prediction. In this work, I first perform an analysis of the various components of the vision transformer. Then, based on these observations, I introduce various iterations of the Token Reduction Module (TRM), which is is a plug-and-play module that can be introduced to a vision transformer and instantly reduce the computation at a small accuracy cost without requiring the transformer model to be retrained. Our final TRM achieves a better theoretical efficiency-accuracy trade off than previous work under the same inference-only setting, and after finetuning for a short period of time is competitive with previous works that implemented and trained much more complex systems. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01gq67jv370 |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ZHANG-RYAN-THESIS.pdf | 5.09 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.