Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01gq67jv370
Title: Token Reduction Module for Efficient Vision Transformers
Authors: Zhang, Ryan
Advisors: Russakovsky, Olga
Department: Computer Science
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2022
Abstract: Model efficiency is a central concern for the practical deployment of computer vision models in real-world scenarios, and the recently popularized vision transformer presents a unique opportunity to approach this problem in a different manner. While convolutional neural networks (CNNs)–which have been the primary tool in computer vision for almost a decade–utilize operations that make it difficult to operate on non-standard shaped images, vision transformers are more robust in this setting. In the image classification task, they function by splitting an image into patches that are then processed separately by the model, while utilizing the attention mechanism to transfer information between them. Since some patches may cover non-important portions of the image, one easy way to reduce the amount of processing needed is by only keeping those patches that are necessary for making a prediction. In this work, I first perform an analysis of the various components of the vision transformer. Then, based on these observations, I introduce various iterations of the Token Reduction Module (TRM), which is is a plug-and-play module that can be introduced to a vision transformer and instantly reduce the computation at a small accuracy cost without requiring the transformer model to be retrained. Our final TRM achieves a better theoretical efficiency-accuracy trade off than previous work under the same inference-only setting, and after finetuning for a short period of time is competitive with previous works that implemented and trained much more complex systems.
URI: http://arks.princeton.edu/ark:/88435/dsp01gq67jv370
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2024

Files in This Item:
File Description SizeFormat 
ZHANG-RYAN-THESIS.pdf5.09 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.