Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018s45qd179
Title: Compiler Support for Deep Learning Accelerators: End-to-End Evaluation and Data Access Optimization
Authors: Li, Yi
Advisors: Malik, Sharad
Contributors: Electrical and Computer Engineering Department
Subjects: Electrical engineering
Computer engineering
Computer science
Issue Date: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: Specialized hardware accelerators have been developed to enhance power-performance efficiency for Deep Neural Network (DNN) applications. A primary challenge in DNN accelerator development is the early-stage evaluation of design prototypes on real-world applications. Such evaluations are crucial: modern DNN accelerators are equipped with several techniques to boost power-performance, but these techniques can introduce numerical discrepancies, such as data quantization with customized nu- merical representation or reformulated operators. Given the deeply-connected layered nature of DNN applications, these numerical errors can accumulate and result in sig- nificant deviations from reference results. Additionally, the energy and performance costs of data movement between host machine and the accelerator’s on-chip memory are substantial, making the reduction of the data transfer a critical optimization focus for mapping DNN applications to accelerators.To address these challenges, this thesis proposes several innovative solutions. First, we introduce “3LA” – an end-to-end compiler pipeline that facilitates application-level testing of hardware accelerator prototypes on unmodified DNN applications. Built upon a recently proposed formal hardware specification named Instruction-Level Abstraction (ILA), 3LA allows for automated application-level simulation, providing crucial development feedback with much reduced manual engineering effort. Second, we proposed Shoehorn, an optimized scheduler designed for mapping DNN operators to hardware accelerators that co-optimizes loop tiling, loop ordering and on-chip memory partitioning decisions. This scheduler creates an optimal mapping schedule for single application-level operators to a specific accelerator, minimizing off-chip memory access. Lastly, this thesis introduces “COSMA,” an optimization framework that aims to minimize total off-chip data access when deploying entire or segments of DNN applications to the target accelerator. COSMA collectively optimizes operator scheduling, memory allocation and tensor replacement strategies, presenting a comprehensive solution to data movement minimization. These contributions are expected to significantly streamline the process of DNN accelerator development, from early-stage design to final application deployment, en- hancing both efficiency and effectiveness in the field.
URI: http://arks.princeton.edu/ark:/88435/dsp018s45qd179
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Li_princeton_0181D_15310.pdf2.62 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.