Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp018s45qd179
Title: | Compiler Support for Deep Learning Accelerators: End-to-End Evaluation and Data Access Optimization |
Authors: | Li, Yi |
Advisors: | Malik, Sharad |
Contributors: | Electrical and Computer Engineering Department |
Subjects: | Electrical engineering Computer engineering Computer science |
Issue Date: | 2024 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Specialized hardware accelerators have been developed to enhance power-performance efficiency for Deep Neural Network (DNN) applications. A primary challenge in DNN accelerator development is the early-stage evaluation of design prototypes on real-world applications. Such evaluations are crucial: modern DNN accelerators are equipped with several techniques to boost power-performance, but these techniques can introduce numerical discrepancies, such as data quantization with customized nu- merical representation or reformulated operators. Given the deeply-connected layered nature of DNN applications, these numerical errors can accumulate and result in sig- nificant deviations from reference results. Additionally, the energy and performance costs of data movement between host machine and the accelerator’s on-chip memory are substantial, making the reduction of the data transfer a critical optimization focus for mapping DNN applications to accelerators.To address these challenges, this thesis proposes several innovative solutions. First, we introduce “3LA” – an end-to-end compiler pipeline that facilitates application-level testing of hardware accelerator prototypes on unmodified DNN applications. Built upon a recently proposed formal hardware specification named Instruction-Level Abstraction (ILA), 3LA allows for automated application-level simulation, providing crucial development feedback with much reduced manual engineering effort. Second, we proposed Shoehorn, an optimized scheduler designed for mapping DNN operators to hardware accelerators that co-optimizes loop tiling, loop ordering and on-chip memory partitioning decisions. This scheduler creates an optimal mapping schedule for single application-level operators to a specific accelerator, minimizing off-chip memory access. Lastly, this thesis introduces “COSMA,” an optimization framework that aims to minimize total off-chip data access when deploying entire or segments of DNN applications to the target accelerator. COSMA collectively optimizes operator scheduling, memory allocation and tensor replacement strategies, presenting a comprehensive solution to data movement minimization. These contributions are expected to significantly streamline the process of DNN accelerator development, from early-stage design to final application deployment, en- hancing both efficiency and effectiveness in the field. |
URI: | http://arks.princeton.edu/ark:/88435/dsp018s45qd179 |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Electrical Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Li_princeton_0181D_15310.pdf | 2.62 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.