Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp013197xq41b
Title: Energy-efficient Capacitor-based Analog In-memory Computing Macros
Authors: Lee, Jinseok
Advisors: Verma, Naveen
Contributors: Electrical and Computer Engineering Department
Keywords: analog in-memory computing
hardware accelerator
In-memory computing
neural network
SNR
Subjects: Electrical engineering
Issue Date: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: In recent years, artificial intelligence (AI) has rapidly gained prominence and has transformed various sectors. Neural network-based AI applications have become common in our daily lives, from self-driving cars and AI speakers recognizing individual voices, to chatbots solving mathematical problems. As AI performs more complex tasks, neural network (NN) models have become extremely complex as well, demanding a huge amount of computations. This prompts development of specialized hardware solutions such as neural processing units (NPUs), designed to cater to these needs. As a future component in NPUs, in-memory compute (IMC) is specialized for matrix-vector multiplication (MVM) computation in AI tasks. IMC distinguishes itself by integrating computation within memory, minimizing the traditional approach of moving data between the data memory and arithmetic logic units (ALUs) in graphics processing units (GPUs) or central processing units (CPUs). This innovation positions IMC as a highly efficient computing solution, particularly in terms of performance and energy consumption for MVMs, with analog in-memory compute (AIMC) demonstrating notable advantages in energy efficiency and processing speed. This work presents multiple designs and analyses for an energy-efficient and high-SNR capacitor-based AIMC macro. These AIMC designs feature a configuration of 1152 rows by 256 columns and are fabricated in 28 nm CMOS technology. First, we introduce a fully row/column-parallel AIMC macro capable of single-shot MVM processing with 5-bit inputs by adopting the dynamic range doubling (DRD) concept, which reduces the circuit complexity needed for generating multi-level signals for one-shot 5-bit processing. With DRD and back-end-of-line (BEOL) capacitors in a memory array of multiplying bit cells (MBCs), the AIMC macro achieves highly-linear MVM computations. Prototype chip measurements demonstrate that 5-bit DRD enhances IMC energy efficiency by 16 times and throughput by 5 times compared to prior 1-bit input designs. This chip was also tested in a neural network demonstration using 5-bit input activations (IAs) and weights for CIFAR-10 classification tests. Furthermore, we provide a systematic analysis of noise performance of AIMC macros. Despite claims in the literature about the importance of output bit-width ratio, the impact of noise in AIMC macros is often overlooked. Our analysis begins at the macro level, examining the effects of bit precision and number format of inputs and weights. This analysis is then extended to the NN layer level, which ultimately needs to be computed. The findings offer macro design insights into the systematic gains determined by macro configuration of input/weight/output bit precision and inner dimension of accumulation. Then, we introduce a bit-reconfigurable AIMC macro design. Different layers in a NN have varying noise sensitivity requirements, typically with the first and last layers being particularly noise-sensitive, whereas other layers are less so. Therefore, there is a demand for reconfigurable noise performance, and this can be done by adjusting the bit precision in the macro configuration. The prototype demonstrates adjustable noise performance based on bit configuration, and the measured results validate the systematic noise analysis provided in this work. Finally, we address the need for higher analog-to-digital converter (ADC) bit precision to mitigate systematic noise amplification. While previous single-ended accumulation systems are limited in increasing ADC bit precision due to the fundamental noise of the analog circuit approaching the least significant bit (LSB) amplitude, this challenge is overcome in our work through a fully differential design. A combination of a differential input activation (IA) and a multiplier forms a multiplying unit (MU), which contains two MBCs. This setup enables differential accumulation at compute line (CL) and ADC, and as a consequence, the doubled voltage swing at the CL offers an opportunity to increase ADC bit precision. The increased ADC bit precision significantly improves the signal-to-noise ratio (SNR) of the AIMC system. The prototype chip exhibits the transfer curves with an error at sub-LSB level and nearly equivalent accuracy in the inference task of ImageNet with the ResNet-18 NN model.
URI: http://arks.princeton.edu/ark:/88435/dsp013197xq41b
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Lee_princeton_0181D_14998.pdf12.03 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.